Latest News

Managing Burnout | Tips To Minimize The Impact

Apr 14, 2022 By Blameless Community In Blameless

Burnout is real. Today, the source of burnout can be anything from pandemic fatigue, to the onslaught of political divisiveness, or simply the pace of life worldwide. Whatever the culprit, we’re living in a stressful time. People working in cloud native environments definitely feel burnt out. Silicon Valley investor Marc Andreessen famously said, “Software is eating the world,” and that seems to be quite true. High demand is fueling churn. System and cloud operators feel pressure.

Read Post

Blameless

Read more about Managing Burnout | Tips To Minimize The Impact

Accelerate incident investigations with Log Anomaly Detection

Apr 13, 2022 By Nicholas Thomson In Datadog

Modern DevOps teams that run dynamic, ephemeral environments (e.g., serverless) often struggle to keep up with the ever-increasing volume of logs, making it even more difficult to ensure that engineers can effectively troubleshoot incidents. During an incident, the trial-and-error process of finding and confirming which logs are relevant to your investigation can be time consuming and laborious. This results in employee frustration, degraded performance for customers, and lost revenue.

Read Post

Datadog

Read more about Accelerate incident investigations with Log Anomaly Detection

The Pros and Cons of Embedded SREs

Apr 12, 2022 By Quentin Rousseau In Rootly

To embed or not to embed: That is the question. At least, that’s one of the questions that companies have to answer as they decide how to implement Site Reliability Engineering. They can either embed SREs into existing teams, or they can build a new, separate SRE team. Both approaches have their pros and cons. The right strategy for your company or team depends, of course, on your needs and priorities.

Read Post

Rootly

Read more about The Pros and Cons of Embedded SREs

Product update: ensure consistent data across all your retros with two new features

Apr 12, 2022 By Dylan Nielsen In FireHydrant

FireHydrant captures your incident, from declaration through remediation, and gives you a framework to run your retrospectives. But retrospectives are only as effective as their inputs. Now we're delivering a better way to learn from and analyze retrospectives by guaranteeing consistent, structured, and sufficient data from your team.

Read Post

FireHydrant

Read more about Product update: ensure consistent data across all your retros with two new features

OnCallogy Sessions

Apr 12, 2022 By Fred Hebert In Honeycomb

Being on call is challenging. It’s signing up to be operating complex services in a totally interruptible manner, at all hours of the day or night, with limited context. It’s therefore critical to have proper on-call on-boarding procedures, offer continuous training sessions, and continuously improve documentation. We also need to make sure people feel safe by providing ways to reduce their stress, and make room for questions to surface all sorts of uncertainties around our operations.

Read Post

Honeycomb

Read more about OnCallogy Sessions

Conflict Management and the Major Incident Management Process

Apr 11, 2022 By InvGate In InvGate

Major incidents are, by their very nature, stressful and intense. The ITIL 4 definition of a major incident is: High-stress situations can cause conflict that left unchecked could delay the fix effort. Since we already have a definitive guide on incident management, this blog post will focus specifically on the major incident management process.

Read Post

InvGate

Read more about Conflict Management and the Major Incident Management Process

xMatters remains a G2 Grid Report Leader

Apr 8, 2022 By Kerin Munro In xMatters

Worldwide businesses and their technical resources use G2, the leading business solution review platform, to analyze software, gather user feedback, and make informed decisions about technology. Although we value all the recognition we’ve earned on G2 over the years, there’s one that always stands out and makes us feel extra proud of what we’ve accomplished so far.

Read Post

xMatters

Read more about xMatters remains a G2 Grid Report Leader

Debug issues and automate remediation with Shoreline and Datadog

Apr 7, 2022 By Thomas Sobolik In Datadog

Shoreline is an incident response automation service that enables DevOps engineers and site reliability engineers (SREs) to quickly debug and remediate issues at scale and develop automated routines for incident management. Using Shoreline’s proprietary Op language, customers can run debug commands across all their hosts simultaneously and then deploy custom scripts via Actions to trigger automated remediations.

Read Post

Datadog

Read more about Debug issues and automate remediation with Shoreline and Datadog

Intelligent Alert Grouping Series Summary

Apr 7, 2022 By Quintessence Anx In PagerDuty

Welcome to our final post in our EI Architecture Series on Intelligent Alert Grouping. I hope you’ve enjoyed this series, and if you’d like to take a look at any of our prior posts, please use the ei-architecture-series tag. Let’s take a moment and recap everything we’ve learned.

Read Post

PagerDuty

Read more about Intelligent Alert Grouping Series Summary

Automated Incident Management | Everything You Should Know

Apr 7, 2022 By Noor-ul-Anam Ruqayya In Blameless

Looking into automated incident management? We explain everything you need to know about what automated incident management is, why it’s important, and how to do it.

Read Post

Blameless

Read more about Automated Incident Management | Everything You Should Know

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Managing Burnout | Tips To Minimize The Impact

Accelerate incident investigations with Log Anomaly Detection

The Pros and Cons of Embedded SREs

Product update: ensure consistent data across all your retros with two new features

OnCallogy Sessions

Conflict Management and the Major Incident Management Process

xMatters remains a G2 Grid Report Leader

Debug issues and automate remediation with Shoreline and Datadog

Intelligent Alert Grouping Series Summary

Automated Incident Management | Everything You Should Know

Monthly Archive

Follow Us