Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Coffee Break Webinar Series: "Intelligent Observability - What the Analysts Say"

We know commitment issues are the real deal, especially when it comes to significant and costly tech investments. Understanding how the market is performing and what’s up ahead is critical for investing in AIOps. Our crew is here to help you through the challenging decision-making days and offer up the best analyst guidance.

Pragmatic Incident Response: 3 Lessons Learned from Failures

In my past experience as an SRE I’ve learned some valuable lessons about how to respond and learn from incidents. Declare and run retros for the small incidents. It's less stressful, and action items become much more actionable. Decrease the time it takes to analyze an incident. You'll remember more, and will learn more from the incident. Alert on pain felt by people — not computers. The only reason we declare incidents at all is because of the people on the other side of them.

Enabling Faster Incident Response and Mitigating Security Risks in Financial Services

Software is eating the world. Digital Transformation is top of mind for companies looking to meet ever-growing consumer demands and digitize manual processes. This isn’t unique to the technology industry. Ecommerce, finance, healthcare, and other industries are all moving in this direction.

BigPanda's Event Enrichment Engine: The secret ingredient for AIOps

James Beard, the pioneer of television cooking shows, once asked, “Where would we be without salt?”. Salt is often underrated, even though it is the ingredient that has the greatest impact on food and flavor in the modern world. It has its own taste, but also balances and enhances the flavor of other ingredients. Salt boosts sweetness and blocks bitterness, it has scientifically proven capabilities to intensify flavor compounds that are too subtle to detect (i.e.

Monthly Moo Update | July 2021

We hope June was as good to you as it was to us. Our latest updates, available now, will keep you relaxing poolside this summer knowing that your monitoring, event correlation, and incident workflows are all connected and automated through the cloud. If you’re not relaxing with a little cloud coverage keeping you cool, then come check out Moogsoft to see how you can keep your services available and your customers happy, so you can get to relax with a little more time in your day.

What is a Blameless Postmortem?

Do blameless retrospectives (or postmortems) help your team? We will explain what they are, if they really work, and how to do them right. A blameless postmortem (or retrospective) is a post-incident document that helps teams figure out why an incident happened, and brainstorm how to improve the process to prevent similar incidents from happening again. In most engineering organizations, everyone agrees that in complex systems, failure is inevitable.

How Linaro Reduced Triage & Call-Out Time with Flow Designer - xMatters Demo

A test server fails and your customers are relying on it, how long does it take your team to get it back up and running? Does that answer differ depending on the hour of the day, or maybe the day of the week? It doesn’t have to. Join Philip Colmer, Director of Information Services at Linaro, Laura Meadows, VP EMEA at xMatters, and Stephen Walters, Solutions Architect at xMatters, as they discuss the innovative ways Linaro has utilized Flow Designer to reduce triage and call-out time!

Using CC&C Platforms to Transform Metrics Into Valuable Insights

Healthcare institutions are increasingly implementing clinical communication and collaboration (CC&C) platforms to improve the productivity of care teams. Automated CC&C platforms perfect care orchestration plans to ensure providers have the means to satisfy the ever-changing needs of patients. Key features of CC&C platforms include real-time, secure mobile messaging and alerting; digital, intelligent on-call schedules; time-stamped message statuses; and automated alert escalations.