%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Three Essential Truths to Delivering Great Customer Experiences

Nov 8, 2019 By Adam Frank In Moogsoft

Employing AIOps for observability, monitoring and service assurance frees developers to focus on building better services.

Read Post

Moogsoft

Read more about Three Essential Truths to Delivering Great Customer Experiences

Smart SLO Alerting With Wavefront

Nov 7, 2019 By Pontus Rydin In PagerDuty

Back in the good old days of monolithic applications, most developers and application owners relied on tribal knowledge for what performance to expect. Although applications could be incredibly complex, the understanding of their inner workings usually resided within a relative few in the organization. Application performance was managed informally and measured casually. However, this model falls apart in a microservices world.

Read Post

PagerDuty

Read more about Smart SLO Alerting With Wavefront

The State of Unplanned Work: Key Findings

Nov 6, 2019 By Evelyn Chea In PagerDuty

It’s a new world order: Skynet has taken over. Just kidding. But it sometimes feels that way, doesn’t it? In the words of Marc Andreessen, software is eating the world, and technology problems are now business problems. This means developers are now the architects of the digital experience and, by extension, the customer experience—and when said developers are unable to innovate quickly, companies are more exposed to competitive threats.

Read Post

PagerDuty

Read more about The State of Unplanned Work: Key Findings

Why Escalations are Important to Clinical Communications

Nov 6, 2019 By Ritika Bramhe In OnPage

Unexpected events make the healthcare profession one of the most challenging industries to navigate and plan for. Sudden, abrupt patient situations tend to occur, increasing the workload of healthcare providers. Similar, process efficiencies and productivity are a reflection of the care team’s ability to communicate. When teams are on the same page, patient wait times are significantly reduced and results are improved.

Read Post

OnPage

Read more about Why Escalations are Important to Clinical Communications

Sentry Integration Platform: Optimizing Incident Management with Amixr

Nov 5, 2019 By Matte Noble In Sentry

It’s hard (if not impossible) to imagine production infrastructure without incidents. And service reliability can be highly dependent on how quickly and efficiently engineers are able to tackle these incidents. Reliability engineers are often faced with four questions... Sometimes the answers to these questions are surprising.

Read Post

Sentry

Read more about Sentry Integration Platform: Optimizing Incident Management with Amixr

RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Nov 5, 2019 By Derek Ralston In PagerDuty

If you’ve worked on a team that has adopted Agile techniques, you’ve probably heard of a retrospective. If not, here’s the TL;DR: A retrospective is a meeting in which a team connects regularly to reflect on what happens throughout a project and continuously improve how they work moving forward.

Read Post

PagerDuty

Read more about RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

Nov 5, 2019 By Mohan Kompella In BigPanda

TL;DR: Fast-moving IT stacks see frequent, long and painful outages. Thousands of changes – planned, unplanned and shadow changes – are one of the main reasons behind this. Until now, IT Ops, NOC & DevOps teams didn’t have an easy way to get a real-time answer to the “What Changed?” question – the answer that can help reduce the duration of outages and incidents in these fast-moving IT stacks. Now, with BigPanda Root Cause Changes, they do.

Read Post

BigPanda

Read more about Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

What Is MTTR? Mean Time to Repair, Explained In Detail

Nov 5, 2019 By Ben Munat In XpoLog

Whether you’re slinging code, managing developers, wrangling servers, or filling most other roles in the modern tech firm, you care about keeping your software running while bringing home the bacon. If your website or application is down, you’re not making money. (Or, if you aren’t in this for profit, your message isn’t getting to the people who need it.) Therefore, it’s everyone’s job to keep things running smoothly.

Read Post

XpoLog

Read more about What Is MTTR? Mean Time to Repair, Explained In Detail

Join the alpha program for Mattermost's new Incident Response Workflow app

Nov 4, 2019 By Jason Blais In Mattermost

Is your InfoSec or DevSecOps team ready to resolve issues as quickly as possible? To help accelerate response times, we’re happy to announce the alpha release of the Mattermost Incident Response Workflow application for Enterprise Edition, supported in Mattermost 5.12 and later. The app is designed specifically for incident response and enables you to connect all your workflows, automate repetitive tasks, and collaborate on incidents—all without leaving Mattermost.

Read Post

Mattermost

Read more about Join the alpha program for Mattermost's new Incident Response Workflow app

Rise of the Digital Operations Ecosystem

Nov 4, 2019 By Jukka Alanen In PagerDuty

Many organizations today are dealing today a lot of complexity and disconnected tools. Teams and departments are running in parallel but siloed from each other. People are burned out from a lot of manual work, and everyone is crunched for time. This is not a happy ecosystem to live in. If this digital ecosystem doesn’t work together, your teams don’t know what’s going on and they lack the right information.

Read Post