Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

While service incidents can be wildly dissimilar, they tend to have one thing in common: a need for quick resolution. Response teams need a robust, repeatable process to follow that ensures fast, mistake-free execution, especially for those 4 AM calls. Having a documented checklist saved where the entire team can access and use it at any time could make the difference between quick resolution or compounding the problem.

Outage or Breach - Confront with Confidence (2021)

A Recent Dice Article Titled – Data Breach Costs: Calculating the Losses referenced a 2021 IBM and Ponemon Institute study that looked at nearly 525 organizations in 17 countries and regions that sustained a breach last year, and found that the average cost of a data breach in 2020 stood at $3.86 million.

Reliable incident alerting for critical IT systems at German health insurance provider Debeka

“Thanks to Enterprise Alert and the acknowledgement function, we can track the alerting and response digitally and have the certainty that our employees always take care of incidents in our critical IT infrastructure in a timely manner. IT alerting with Derdack, which has to be documented according to BaFin KRITIS, is highly reliable.”, Markus Reusch, Product Owner Monitoring, Debeka

Announcing Grafana OnCall, the easiest way to do on-call management

A critical part of managing modern software development is setting up and running an on-call rotation. But that often involves significant toil, in part because many of the existing tools are cumbersome and not developer-friendly. That’s why we’re excited to announce Grafana OnCall, an easy-to-use on-call management tool that will help reduce toil in on-call management through simpler workflows and interfaces tailored for devs.

Now you see me, now you don't: feature-flagging with LaunchDarkly at incident.io

At incident.io, we ship fast. We're talking multiple times a day, every day (yes, including Fridays). Once I merge a pull request (PR), my changes rocket their way into production without me lifting a finger. 💅 It's when we tackle larger projects that this becomes a bit more complicated. We recently launched Announcement Rules, which let you configure which channels incident announcements are posted in depending on criteria you define.

Your Ops and DevOps teams need to work together, and fast. Who you gonna call?

The world is moving fast, led by an ever-accelerating IT landscape. In recent years, two distinct types of teams have emerged that assist in driving this business transformation: DevOps/SRE teams that are in charge of driving rapid innovation of products and services, and IT Ops/NOC teams that focus on preventing outages and maintaining the high level of quality, reliability and serviceability that modern, discerning customers expect.

How Playbooks improve customer service delivery, agent productivity

We all know one bad experience can impact a customer’s perception of—and even willingness to deal with—an organization going forward. That’s why so many companies, in virtually every industry, have made investing in customer experience (CX) a top priority, according to ResearchAndMarkets.com. The problem is, for any given organization, there are a number of customer service processes along the entire life span of an interaction that need to be looked at and made great.

Make sense of complex systems with Dynamic Service Graph by PagerDuty

The Dynamic Service Graph breaks down silos between teams and provides organizations with a living, breathing asset that displays technical and business services and their relationships at scale. It allows teams to quickly grasp the state of services, visually digest the full impact radius of an issue, zero in on likely cause, and seamlessly facilitate cross-team collaboration.