Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Logs and tracing: not just for production, local development too

Nov 11, 2021 By Lawrence Jones In Incident.io

We're a small team of engineers right now, but each engineer has experience working at companies who invested heavily in observability. While we can't afford months of time dedicated to our tooling, we want to come as close as possible to what we know is good, while running as little as we can- ideally buying, not building. Even with these constraints, we've been surprised at just how good we've managed to get our setup.

Read Post

Incident.io

Read more about Logs and tracing: not just for production, local development too

Avoid frostbite: Stop doing code freezes

Nov 11, 2021 By Robert Ross In FireHydrant

As the holiday season aggressively approaches I want to perform a public service announcement for everyone toying with the idea of a code freeze for the holidays: please don't. It’s getting cold outside and the season of peppermint mochas is upon us, which might get you thinking about putting a code freeze in place for the holidays. A Word of warning: instituting a code freeze may have unintended consequences.

Read Post

FireHydrant

Read more about Avoid frostbite: Stop doing code freezes

Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Nov 10, 2021 By Elli Ludwigson In Mattermost

While service incidents can be wildly dissimilar, they tend to have one thing in common: a need for quick resolution. Response teams need a robust, repeatable process to follow that ensures fast, mistake-free execution, especially for those 4 AM calls. Having a documented checklist saved where the entire team can access and use it at any time could make the difference between quick resolution or compounding the problem.

Read Post

Mattermost

Read more about Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Microservice Architecture | What It Is & Why It Matters

Nov 10, 2021 By Noor-ul-Anam Ruqayya In Blameless

Curious about microservice architecture? We explain what microservice architecture is, and how it can be used to quickly produce reliable lightweight applications.

Read Post

Blameless

Read more about Microservice Architecture | What It Is & Why It Matters

Viewing Your Devices on iOS - xMatters Support

Nov 10, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he navigates you through the My Devices screen in the xMatters app for iOS devices.

View Video

xMatters

Read more about Viewing Your Devices on iOS - xMatters Support

4 Recommendations for Optimizing DevOps

Nov 10, 2021 By xMatters In xMatters

The concept and development of DevOps have significantly changed the way IT teams work in the last decade. Small and large teams alike can see the difference when they switch from traditional software development cycles to a DevOps cycle: accelerated innovation, improved collaboration, faster time to market. And the list of benefits continues to grow. To effectively embrace DevOps, however, is not an easy task. Thankfully, there are ways to navigate this challenging journey.

Read Post

xMatters

Read more about 4 Recommendations for Optimizing DevOps

Outage or Breach - Confront with Confidence (2021)

Nov 10, 2021 By AlertOps In AlertOps

A Recent Dice Article Titled – Data Breach Costs: Calculating the Losses referenced a 2021 IBM and Ponemon Institute study that looked at nearly 525 organizations in 17 countries and regions that sustained a breach last year, and found that the average cost of a data breach in 2020 stood at $3.86 million.

Read Post

AlertOps

Read more about Outage or Breach - Confront with Confidence (2021)

Reliable incident alerting for critical IT systems at German health insurance provider Debeka

Nov 10, 2021 By Derdack In Derdack

“Thanks to Enterprise Alert and the acknowledgement function, we can track the alerting and response digitally and have the certainty that our employees always take care of incidents in our critical IT infrastructure in a timely manner. IT alerting with Derdack, which has to be documented according to BaFin KRITIS, is highly reliable.”, Markus Reusch, Product Owner Monitoring, Debeka

Read Post

Derdack

Read more about Reliable incident alerting for critical IT systems at German health insurance provider Debeka

How to improve your influence as an SRE

Nov 10, 2021 By Ricardo Castro In Squadcast

Improving your influence over the company will help you deliver high quality work as your goals will be closely aligned with those of the company. In this blog piece, Ricardo has explained how to improve your influence as an SRE. Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task.

Read Post

Squadcast

Read more about How to improve your influence as an SRE

Announcing Grafana OnCall, the easiest way to do on-call management

Nov 9, 2021 By Matvey Kukuy In Grafana

A critical part of managing modern software development is setting up and running an on-call rotation. But that often involves significant toil, in part because many of the existing tools are cumbersome and not developer-friendly. That’s why we’re excited to announce Grafana OnCall, an easy-to-use on-call management tool that will help reduce toil in on-call management through simpler workflows and interfaces tailored for devs.

Read Post