Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What's New: Updates to Incident Response, PagerDuty Process Automation Software & PagerDuty Runbook Automation, Integrations, and More!

We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, as well as Community & Advocacy Events updates. We continue to help customers further automate to optimize cloud operations and reduce the amount of issues escalated to other teams.

Fast track video series: Extracting alert data from emails

With BigPanda’s self-service Email Parser, extracting alert data from emails has never been more simple. In our latest video in the Fast track series, we explore the benefits of this tool. This parser is ideal for monitoring tools and systems that do not support REST API and or rely solely on email to generate and send alerts. So no matter what tools your organization utilizes, this feature can help you turn all of those alert emails into actionable incidents within BigPanda’s platform.

How to build a successful on-call team - incident.fm

In this podcast, our panellists discuss what it means to build a successful on-call team. Drawing on their experiences at fast growing start-ups and scale-ups, incident.io co-founders Pete and Chris cover everything from who should be on the rota and how to build a compassionate on-call culture, to compensation structures and tips for operationalising on-call.

PagerDuty and DataOps: Enabling Organizations to Improve Decision Making with Better Data

Many organizations have been digitally transforming their operations and the majority of them are moving to the cloud. With this transformation, data teams have to analyze ever larger and more complex data sets to allow downstream teams to make faster and more accurate decisions on a daily basis. Consequently, most organizations need to work with: customer data, product data, usage data, advertising data, and financial data.

Do You Understand Your Essential Business Processes?

Before you can choose the proper tools for your organization, you have to understand its essential business processes. Once you know an essential business process, you can review software applications that will help make your organization more efficient and accurate. Unfortunately, many organizations do not understand their essential business processes. This makes it nearly impossible for them to streamline their organizations, which puts them at a disadvantage in the marketplace.

A deep-dive into event correlation

Event correlation is a powerful capability that can help reduce IT noise, detect incidents in real-time, and improve the performance of critical applications and services. Read on for a deep dive into event correlation as we explore everything from its origins to its current state-of-the-art techniques. We’ll also discuss how event correlation fits into the bigger picture of integrated service management.

The Roblox Outage

Just before Halloween 2021, Roblox engineers experienced a horror story: a service outage that also took down critical monitoring systems. It seemed like the issue was a hardware problem, but it wasn’t. Users were frustrated, and the clock was ticking. After three full days of downtime, service was finally restored on Halloween day. While the incident itself was an IT nightmare, Roblox’s detailed technical post-mortem several months later was an excellent way to bounce back.
Sponsored Post

Introduction to Automation Testing Strategies For Microservices

Microservices are distributed applications deployed in different environments and could be developed in different programming languages having different databases with too many internal and external communications. A microservice architecture is dependent on multiple interdependent applications for its end-to-end functionalities. This complex microservices architecture requires a systematic testing strategy to ensure end-to-end (E2E) testing for any given use case. In this blog, we will discuss some of the most adopted automation testing strategies for microservices and to do that we will use the testing triangle approach.

Point Solution Monitoring vs. Domain-Agnostic AIOps. Which is Right for You?

Just consider how much of your day relies on online digital technologies. Perhaps you hopped on an app to pre-order your morning coffee and then logged onto a platform to book a car to work. Or, perhaps you stayed home to work, using digital tools to connect with your colleagues and exchange information.