Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

iGaming: Where Incident Management Meets Compliance

Aug 16, 2020 By Noam Morginstin In Exigence

At times when players have multiple online choices and competition is fierce, safe betting and social responsibility is at the forefront of brand integrity. In fact, social responsibility has become a competitive edge for leading operators. Enter the era of the regulator. Regulation is now defining both the operator’s brand integrity and the player experience. Are online operators up to the regulation task? Some are, though some are not.

Read Post

Exigence

Read more about iGaming: Where Incident Management Meets Compliance

Resilience in Action, E5: Tammy Bryant and Eric Roberts The Importance of Glue Work

Aug 14, 2020 By Blameless Community In Blameless

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others.

Read Post

Blameless

Read more about Resilience in Action, E5: Tammy Bryant and Eric Roberts The Importance of Glue Work

Humanizing a DevOps Transformation

Aug 14, 2020 By Joseph Mandros In PagerDuty

Anyone who’s ever played the game of chess knows there’s more than one way to reach a desired outcome. There are 400 possible setups after the first turn; 197,742 after the second; and just north of 120 million after the third—all of which are marching toward the same desired outcome. “So, what does any of this have to do with DevOps?” you ask? Fair question.

Read Post

PagerDuty

Read more about Humanizing a DevOps Transformation

Effective Communication Between Healthcare Professionals - Best Practices

Aug 14, 2020 By OnPage Corporation In OnPage

Effective communication between healthcare professionals is critical for timely and effective operations. In a modern healthcare environment, communication technologies are critical for connecting healthcare professionals with other caretakers and healthcare entities, ensuring the best, most effective, immediate care to patients.

Read Post

OnPage

Read more about Effective Communication Between Healthcare Professionals - Best Practices

Choosing the Right SRE Tools

Aug 13, 2020 By Emily Arnott In Blameless

Implementing SRE practices and culture can be challenging. Fortunately, there are a variety of tools for each aspect of SRE: monitoring, SLOs and error budgeting, incident management, incident retrospectives, alerting, chaos engineering, and more. In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

Read Post

Blameless

Read more about Choosing the Right SRE Tools

Webinar: AIOps Outcome - Accelerate Incident Resolution

Aug 13, 2020 By CloudFabrix In CloudFabrix

Bite Size AIOps Knowledge Sessions - Accelerate Incident Resolution with faster root cause analysis and contextual insights and recommendation

View Video

CloudFabrix

Read more about Webinar: AIOps Outcome - Accelerate Incident Resolution

Enterprise Alert 2019 Update 8.5.0 released

Aug 13, 2020 By Derdack In Derdack

On August 13th, 2020 we released a new Enterprise Alert version, version 8.5.0. Included in this release are the following product enhancements.

Read Post

Derdack

Read more about Enterprise Alert 2019 Update 8.5.0 released

I Have An SLO. Now What? -Alex Hidalgo

Aug 13, 2020 By Blameless In Blameless

It’s 2020: There is a plethora of data available about measuring SLIs and setting SLO targets. But, now that you have this data, what are you actually supposed to do with it? The classic example of “Ship features when you have error budget; focus on reliability when you don’t.” is antiquated, too simple, and ignores all of the amazing discussions and decisions you can have with your SLO data. Let’s talk about how you can use SLOs to actually make people happier — from your customers, to your engineers, to your business.

View Video

Blameless

Incident Management

Read more about I Have An SLO. Now What? -Alex Hidalgo

Look Upstream to Solve your Team's Reliability Issues

Aug 12, 2020 By Hannah Culver In Blameless

In “Upstream” by Dan Health, we explore a variety of different problems ranging from homelessness, to high school graduation rates, to the state of sidewalks in different neighborhoods within the same city. In each of these examples, Dan discusses how upstream thinking decreased downstream work. Upstream thinking is characterized as proactive, collective actions to improve outcomes rather than reactions after an issue has already occurred.

Read Post

Blameless

Read more about Look Upstream to Solve your Team's Reliability Issues

Keeping your teams and customers in the loop during downtime

Aug 12, 2020 By Squadcast In Squadcast

Making your organization more transparent is not always an easy process. In our latest blog post, Adam Hammond, shares some tips and tools that can help you get started when it comes to keeping your teams and customers in the loop during downtime.The core message is that you need to make communication a cultural pillar of your organization.

Read Post