Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Should you care about AIOps? Obviously.

There's a lot of hype in the marketplace about AIOps right now, and there's a lot of people who've got some interesting ideas about what it should be. The most common idea that I hear is that it's essentially a layer of AI magic that sits across everything that you've got in your IT tooling today and then make sense of all of that for you and then we'll decrease the number of incidents you have and reduce your MTTR...

Incident Management Process- 6 Tips to Better Prepare Your IM Process for The Holiday Season.

Holiday retail sales are likely to increase between 7% and 9% in 2021, according to Deloitte’s annual holiday retail forecast with holiday sales totaling $1.28 to $1.3 trillion during the November to January timeframe. Deloitte also forecasts that e-commerce sales will grow by 11-15%, year-over-year, during the 2021-2022 holiday season.

Evaluating Opsgenie Alternatives

Atlassian’s Opsgenie is a leading incident alerting and on-call management tool, helping business manage their incident response and resolution needs. As part of the Atlassian product suite, Opsgenie has become one of the most popular solutions in the industry. But it’s not the only incident management tool on the market, and it’s vital when looking at Opsgenie and its alternatives, you do a deep dive into its features and abilities.

How Your ITSM Tool & PagerDuty Make a Dynamic Duo for Real-Time Work

There’s an incident. Your teams need to communicate with the development team that owns the service, but that team is too busy to stop and chat. Meanwhile, you in central IT have business leaders asking for updates, angry internal users calling the help desk, and customer service representatives asking for information. You have hundreds of tickets all pertaining to the incident in your ticketing system.

What SREs Can Learn from Facebook's Largest Outage

Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.

PagerDuty Integration Spotlight: Honeycomb

Honeycomb delivers observability for modern engineering and DevOps teams to observe, debug, and improve production systems efficiently. The PagerDuty + Honeycomb integration uses Honeycomb Triggers to notify on-call responders based on alerts sent from Honeycomb. This integration is maintained and supported by Honeycomb. Liz Fong-Jones from Honeycomb joined us live on Twitch to share more about how Honeycomb and PagerDuty can be used together to help your teams and to do some live investigation into Honeycomb’s own performance data.

4 xMatters Use Cases That May Surprise You

xMatters is part technology, part service reliability, and a little bit of magic. If you’ve spent time on the xMatters website, you’ll likely have seen a number of valuable use cases for the platform—it can alert SREs when there’s a website outage, it can accelerate product development for DevOps teams, it can manage on-call schedules and alerts for support teams.

Incident Response: A Step-by-Step Guide to Managing Incidents

Looking into Incident Response? We explain incident response, the end-to-end process, the teams involved, and steps to take to avoid friction and slow-down. The goal is to manage the incident as efficiently as possible in order to restore or resume the service to its expected operational state.

The Cost of Increasing Incidents: How COVID-19 Affected MTTR, MTTA, and More

Digital transformation accelerated for many companies during the last 18 months. While it may have been on the agenda prior to COVID-19, teams were pushed to extreme speeds to digitize and meet the rising online demand. During this time, organizations learned important lessons that they’ll carry on with them into this new future. Leaders can take these learnings and use them to build better products, healthier and more efficient teams, and a happier customer base.