Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerDuty Apps for AWS + Automated Diagnostics Demo Highlights (3 min.)

"Reduce downtime and customer impact with service ownership while enabling teams to drive continuous improvement and innovation Learn about how you can modernize and optimize your operations with our enterprise-grade set of AWS integrations. Automate incident response with PagerDuty’s Runbook Automation and learn about our new set of AWS plugins and prebuilt jobs that make it easier to get up and running with auto-diagnostics."
Sponsored Post

How Adaptive Incident Management Gives You the Upper Hand

One of the great things about the TV detective Columbo was that he never made a hasty decision based on first impressions or appearances at a crime scene. It didn't matter how obvious it seemed to be who committed the crime (or how good the frame-up) was: Columbo always dug deeper into motives, opportunities, and methods to uncover who the guilty party was.

BigPanda's new self-service tools are primed to make integration onboarding even faster

BigPanda supports inbound integrations for alert ingestion out of the box; however, many IT organizations have older, rarer or custom-built tools that require a little more work upfront. Fortunately, BigPanda’s recently announced Open Integration Manager and Email Parser aim to streamline integrating these kinds of monitoring tools with the BigPanda platform.

October is National Cybersecurity Awareness Month

It’s National Cybersecurity Awareness Month, and as a Cybersecurity Awareness Month Champion Organization, xMatters is proud to be actively participating. Since the National Cybersecurity Alliance started this initiative in 2004, the number of devices connected to the internet and the amount of time we spend interacting online has increased exponentially. The impact on our lives is so massive that it’s become hard to imagine what life would be like without our devices.

Defining and measuring your SLIs and SLOs

Customers expect that online services are available all the time. The truth is that outages happen to almost everyone because providing 100% service availability is challenging and costly. Creating reliable and profitable service is, amongst other things, finding the balance between application availability, costs and time to market. Faster feature delivery means less availability as constant changes to production may cause issues and introduce bugs.

Create and Manage Maintenance Windows Through PagerDuty Mobile App

In order to respond in real-time to urgent, critical digital incidents, on-call responders must be able to take action from anywhere. But when on-call responders become overwhelmed with alerts, they often just “ignore them” because they cannot tell the difference between a real alert and a false one.

Product metrics @ incident.io, a year (and a half) in

We’ve been celebrating a few big milestones 🎉 at incident.io in the last few months. We were recently discussing product metrics (as you do for fun on a Friday afternoon 🤓) , and Lawrence was very surprised with a particular stat around the number of workflows that have been run using incident.io.

Got an incident? pull the Andon Cord

Andon Cord catapulted Toyota into 40 years of unprecedented quality and domination. What is Andon Cord and how did they do it? In the early 1900s, Taiichi Ohno architected and introduced Andon cord in Toyota's manufacturing plants. The problem: This costs a lot of money. Production costs have always been high. In 1984, it cost NUMMI $15,000 per minute. That's $42,758 in today's value.