Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Alert Fatigue: The Silent Reliability Killer in Modern IT Operations

By Doreen Jacobi, CEO of Derdack Corp Modern IT environments generate a high volume of alerts intended to improve detection and response. However, increasing alert volume does not necessarily improve operational outcomes. Alert fatigue is not simply a function of quantity. It is a predictable consequence of how humans process repeated stimuli, manage limited cognitive resources, and make decisions under sustained load.

Who's on call? How Claude helped us calculate this 2,500x faster

Schedules are a core part of any on-call system. In ours, they define who to page and when. But people use them in lots of other ways too: checking their next shift, asking for cover while at the gym, keeping a Slack user group up to date, or updating a Linear triage responsibility. For many of our customers, they’re one of the main ways they interact with our product, and as they’re such a foundational part of On-call, it’s very important they work well.

SLAs, SLOs, SLIs, and KPIs

The incident is over. The service is back up. The monitoring dashboard is green, the on-call engineer has stood down, and the post-incident review is on the calendar for Thursday. But there is a question that separates good operations teams from great ones: do you actually know what that incident cost you in terms of reliability commitments? Whether you breached an SLO. Whether a customer-facing SLA is now at risk.

What is IT incident management? How does agentic ITOps help?

Imagine you’re in the middle of a critical project, and suddenly, your system crashes. Or it’s the middle of the night, and your server goes down, affecting countless users. While no enterprise can avoid all IT incidents, how you handle them can significantly reduce their impact. Fast, effective IT incident management is critical, as major incidents are increasingly costly.

SRE agent vs. traditional engineer: 7 key differences

The role of a Site Reliability Engineer (SRE) is evolving. The focus has shifted from simply working harder during an outage; A new kind of teammate is here to help: the SRE Agent. But what are the key differences when you compare an SRE agent versus a traditional site reliability engineer? This isn’t just a superficial change. It signifies a fundamental alteration in how teams construct and sustain dependable services.

Do Hospitals Still Use Pagers in 2026? Why They're Not Secure (And What's Replacing Them)

Are hospitals still using pagers in 2026? The answer might surprise you. In this video, we break down why hospital pagers are still used today, the security risks of pagers, and whether they meet HIPAA compliance standards. While pagers have long been trusted for their reliability, many healthcare organizations are now re-evaluating their role in modern clinical communication. We also explore why pagers are considered insecure, including the lack of encryption, no read receipts, and limited communication capabilities, all of which can impact patient care and coordination.

Best Call Routing Software for On-Call Teams in 2026 (After-Hours & Emergency Routing)

Most teams don’t go looking for “call routing software.” They’re trying to solve something more immediate: calls coming in after hours, no clear owner, and something important getting missed.

Do Hospitals Still Use Pagers in 2026? Pager Replacements

Remember the small rectangular devices that could receive short messages? Some may think of it as an outdated device that people have long forgotten about, while others still use it to this day. Pagers, although becoming less and less relevant, are still used by many large hospitals that deem them an essential tool for their day-to-day critical communication. But in 2026, are there pager replacements in the market?

What Is Mean Time to Resolve (MTTR)? (And How to Improve It)

Every minute a network incident goes unresolved costs your company money. Lost productivity, missed SLAs, degraded user experience, and, in other cases, direct revenue loss. For IT teams and network admins, the pressure to resolve incidents fast isn't just operational, it's existential.