Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How generative AI facilitates ITOps modernization

IT teams need immediate and automatic access to machine data and institutional knowledge to move faster and make the right decisions. And they need context to identify incidents and understand how to resolve them. AIOps enables this by transforming noisy and fragmented operations data into actionable insights. This is the foundation of full-context operations. Full-context operations combines observability and other machine-generated data with historical, expert, and institutional knowledge.

OnCall Management (2025)

Your enterprise may have oncall management employees available across various departments, and these workers can help your business if problems arise, even outside of normal operating hours. How you manage your oncall management teams can have significant ramifications on your enterprise and its stakeholders. To understand why this is the case, let’s look at what it means to be “oncall,”. Along with tips and recommendations to help your enterprise staff achieve its desired results.

Grafana Incident: new tools for faster, simpler incident response

At Grafana Labs, we’re committed to helping teams dramatically improve how they manage and respond to incidents. Through Grafana Incident Response & Management (IRM), we provide tools to empower teams, streamline processes, and enhance the effectiveness of incident management strategies—and we’re constantly looking for ways to make our solution even better.

Unveiling the power of AI in incident management

The emergence of AI opens new and innovative possibilities, simplifies operations, and boosts overall success. With AIOps, your technical organization can achieve unparalleled efficiency, productivity, and profitability. This cutting-edge technology leads us toward a brighter, more prosperous future with exciting opportunities to grow and thrive.

Speedrun to Signals: automated migrations are here

When we launched Signals to the world, we were excited to hear how our product resonated with many teams. But with that excitement came an understandable concern: how much time and effort will I have to put in to move from my existing provider to Signals? We hear you — that’s why we built the Signals Migrator tool. And we’re open sourcing it.

Practical lessons for AI-enabled companies

We went live with our first set of AI-enabled features a few months ago. Needless to say, we learned a lot along the way, as this was the first time we had experimented with generative AI. Here, I'll share some of what we've learned as we’ve grappled with using LLMs to power new products at incident.io. This will be most applicable to the application layer, AI-enabled but not AI companies.

PagerDuty Appoints Eduardo Crespo, Vice President of EMEA

PagerDuty, Inc announces the appointment of Eduardo Crespo as vice president of EMEA. Crespo will lead PagerDuty's next phase of growth in the EMEA region bringing the PagerDuty Operations Cloud to enterprise customers across EMEA to solve their biggest digital challenges.

Why more low severity incidents can be a good thing #incidentmanagement

In this clip, Dennis Henry of Okta explains why having more low-severity incidents can be a good thing. In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response. In that conversation, one of the myths we spoke about was the idea that asking “why” is better than asking “how.” And how, in reality, asking "how" allows you to focus more on the contributing factors that led to an incident happening, whereas “why” tends to single out a person, which can lead to a lot of blame.