Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Training Intelligent Alert Grouping

We’re continuing on with our third piece about how to utilize and improve your Intelligent Alert Grouping (IAG)! In case you missed it, the first two blog posts describe the feature (here) and explain how it uses merging to group alerts (here). We alluded to today’s post at the end of last: today we’ll be discussing how to use alert titles to improve IAG matches.

What Your System Outage Notifications Need To Say

System outages happen to the best of us. Communicating with your customers and other stakeholders effectively during downtimes is vital to maintaining a solid relationship with them. When a system outage occurs, technical teams are tasked with swiftly locating the cause and resolving the issue, while communications teams are tasked with notifying stakeholders and customers about the outage to maintain transparency.

Using Event Orchestration to reduce noise and trigger next best action

We often hear from customers that they’re dealing with unmanageable levels of noise and complexity, which makes it harder to pinpoint root cause and get to resolution quickly. All this effort spent on sifting through noise, processing events, and gathering context results in a lot of wasted time. That’s why we’ve launched Event Orchestration, which became generally available to our Event Intelligence and Digital Operations customers on Monday.

Announcing our newest integration: Confluence

Using FireHydrant’s Runbooks, incident and retro data can be automatically sent to Confluence at any point in the incident lifecycle. For example, the moment you’ve resolved an incident FireHydrant can create a fresh Confluence page with all of the critical incident information stored in FireHydrant. When utilizing Runbook conditions, you can choose the perfect moment to send your FireHydrant retro to a Confluence workspace.

Sponsored Post

Five Ways Developers Can Help SREs

Reliability is a team game. More the collaboration between Developers and SREs, greater will be the success of the product. In this blog, we have listed down the five best practices that developers can adopt, to make the SRE's life easier. It is not easy to be a site reliability engineer. Monitoring system infrastructure and aligning them with the key reliability metrics is quite a daunting task. Whereas, a software engineer's job is to deliver high-quality software.

Introducing CommsFlow for Context-Rich and Timely Updates to All Stakeholders

We’re so excited to announce our latest platform feature, CommsFlow™! This addition to the core Blameless product offering allows teams to keep stakeholders updated as the reliability of services and applications change. With our new automated and customizable communication flows, on-call, engineering, and business teams feel a sense of accomplishment and, of course, stay informed.

Get Paid to Write About Mattermost Playbooks

Mattermost Playbooks help software engineering teams orchestrate their work across all tools and teams to plan projects and hit milestones by uniting your tech stack through a single point of collaboration. We want to see how our community is leveraging Playbooks in their own tech stack and share your creations with everyone so the whole community benefits. We’re doing this by launching a new effort to commission original blog articles that show Playbooks in action.

Episode 2: Mooving to Remix: Code You Will be Happy With

Episode 2 of Mooving to… dives into a new tool called Remix, a framework to help create front-end code, you’ll love. This episode focuses on a new web framework that helps streamline your processes and eliminate downtime to the best of your ability. Thom Duran and Andrew Leonard of Moogsoft are joined by Kent C. Dodds, Director of Developer Experience at Remix.