Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How AIOps modernizes CMDBs to drive accuracy and value

Maintaining your Configuration Management Database’s (CMDB) accuracy, keeping it fully updated, and improving its performance is a frustrating and elusive goal for ITOps and IT leaders. Aiming for this ‘golden’ CMDB standard can feel like running on a treadmill where you’re putting in a lot of work, but remain as distant as ever from your goal. Can IT leaders ever catch up?

What is HCAHPS: A Comprehensive Overview

In the realm of hospitals and healthcare organizations, the term “HCAHPS survey” is a recurrent presence: Hospital Administrator A: “The latest HCAHPS survey results just came out, and patients seem satisfied with…” Hospital Administrator B: “Some of our past patients participated in the HCAHPS survey, but they expressed disappointment with…” You might be left wondering, “What exactly is the HCAHPS survey?” Allow me to elucidate.

Bridging the ITIL vs DevOps Mindset: CI/CD Best Practices for ITIL Organizations

DevOps practices in software development have revolutionized the way updates are released. However, many companies entrenched in ITIL practices find it challenging to seamlessly integrate with the DevOps practice of Continuous Integration and Continuous Delivery/Deployment (CI/CD). This is because ITIL focuses on stability, which suits older systems, while DevOps is ideal for modern setups with its agile, automated practices.

Revolutionizing your Grafana setup with intelligent alerting

Once upon a time, in the bustling city of DataVille, lived a team of dedicated IT professionals tirelessly working to maintain the city’s digital heartbeat. Their mission was to ensure the smooth operation of their city’s digital infrastructure, which was not limited to the daytime operations but extended beyond business hours. They were the unsung heroes, the guardians of the city’s data. Their tool of choice? Grafana, a powerful open-source platform for observability.

Choosing the Right Career Path in Tech: Software Engineering vs. Site Reliability Engineering (SRE)

The tech industry is booming, and there are many different career paths. But, two of the most popular and in-demand roles are Software Engineering and Site Reliability Engineering (SRE). Site Reliability Engineering (SRE) blends elements of software engineering with IT operations, focusing on reliability. On the other hand, SWE Software Engineering involves designing, developing, testing, and deploying software applications.

October 2023 Update - New layout, additional cross links, improved event filtering and much more

Our October update brings a new layout in the web portal, new additional cross-references from Signl details to linked entities, and improved grouping options for conditions in the distribution rules. As always, all the details are in this blog article.

What is Mean Time Between Failures - and why does it matter for service availability

Mean Time Between Failures (MTBF) measures the average duration between repairable failures of a system or product. MTBF helps us anticipate how likely a system, application or service will fail within a specific period or how often a particular type of failure may occur. In short, MTBF is a vital incident metric that indicates product or service availability (i.e. uptime) and reliability.

Enhance Your Customer Service with PagerDuty for ServiceNow CSM

In today’s fast-paced, digital-first landscape, delivering exceptional customer experience is paramount to business success. For customer service teams, that means maintaining service level agreements (SLAs) and ensuring swift responses to customer issues that can make or break your company’s reputation. Fortunately, PagerDuty has improved the way companies handle customer service teams and has built applications into ServiceNow’s CSM platform.

Global Event Rulesets: Streamlining Alert Routing Across Services

In the fast-paced world of organizations handling numerous microservices and projects, tackling the challenges that arise can be a daunting task. As many of our customers come with infrastructures that included a large number of microservices we set out to make it easier for them to streamline alert source management. Enter Global Event Rulesets (GER). This feature is designed to redefine the way you manage alerts.

The Link Between Early Detection and Internet Resilience: A Lesson from Salesforce's Outage

Almost every study examining the hourly cost of outages invariably leads to a clear and undeniable conclusion: outages are expensive. According to a 2016 study, the average cost of downtime was estimated at approximately $9,000 per minute. In a more recent study, 61% of respondents stated that outages cost them at least $100,000, with 32% indicating costs of at least $500,000 and 21% reporting expenses of at least $1 million per hour of downtime.