Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Little Known Ways to Better Use Your Error Budgets

One of the most versatile and foundational SRE tools is the SLO, or service level objective. The SLO is a threshold set for key reliability metrics. When incidents push the metric over the threshold, a response launches to prevent further damage. Conversely, as long as you meet your SLO, you can continue to ship new code. The space you have before you breach this threshold is the error budget.

Incident Ready: How to Chaos Engineer Your Incident Response Process - FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.
Sponsored Post

Boost IT Savings with CloudReady and Incident Workflow

Companies love data. Aggregating data from multiple sources makes decision-making easier and brings a new depth of the conversation to business meetings. But all of this is at the management level. IT managers and administrators also search for data from multiple sources to ensure that the ecosystem works. Companies demand the continued maintenance and availability of mission-critical applications. Without a framework or incident workflow, revenue can suffer, and customers churn if the company does not proactively address problems that arise in its infrastructure.

Modern Operations Best Practices from Engineering Leaders at New Relic and Tenable

As reliability shifts left, more companies are adopting SRE best practices. These best practices don’t only include conducting incident retrospectives. The heart and soul of these best practices are a blameless culture and a desire to grow from each incident. In a recent industry leaders’ roundtable hosted by Blameless, top experts discussed how teams can embrace SRE best practices and make cultural shifts towards blamelessness.

Segment and SIGNL4: Know your Customer's Actions, Anywhere and Anytime

You have a web site, app, online shop, or SaaS offering? Then you have plenty of user actions. That can be visiting a certain page, signing up for a service or canceling a subscription. Wouldn’t it be great to know in real time when an important customer action takes place? This would allow you sales, customer service or technical teams to act immediately no matter where they are.

MSP Security Incident Response Planning (a Quick Guide)

Every second counts when it comes to Managed Service Provider (MSP) security — the longer it takes an MSP to complete security incident response, the greater the ramifications of the incident on the service provider and its stakeholders. When faced with a cyber attack, it’s crucial to understand the potential consequences of the security incident. It also is paramount for an MSP to establish a plan, so it can quickly and effectively respond to cyber attacks and other security incidents.

Top Observability tools for DevOps Engineers and SREs

Better visibility is the first step to improved system stability. Our latest blog outlines Top Observability tools for DevOps Engineers & SREs to help you get started on your journey to gain valuable insights into your infrastructure. “We can't fix something which we can't observe” - whether it's a steam engine or a complex microservice based cloud deployment, great observability makes troubleshooting things easier.

How to Replace Your Opsgenie ConnectWise Manage Integration

Looking for a replacement for your Opsgenie ConnectWise Manage integration? Conversations are popping up on Reddit with concerns over the recent deprecation of Opsgenie’s ConnectWise Manage Integration. In case you’re interested, here is more info about that. And, here’s the warning in Opsgenie’s docs.