Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

When I asked Charlie for permission to attend this year’s AICon (virtual, natch) I thought it would be a shoo-in; learning’s part of my OKRs after all. But he never makes things easy and his ‘yes’ came with a caveat that’s typical when dealing with him. This time, he claimed he didn’t have the budget for the ticket (a likely story!) and I’d have to find another way to get one.

Understanding a Microsoft Service Outage

Maintaining business continuity when an issue arises has proven to be a challenge many organizations struggle with. A global pandemic being thrown into the mix in Q1 of 2020 (one that many businesses are still navigating through) introduced a new set of problems for both service providers and businesses reliant on those services.

Enhance NOC Alerts With Incident Management and Alert Automation

In a network operations center (NOC), alerts originating from hundreds of servers, application monitoring systems, emails and ticketing services compete to catch a NOC analyst’s attention. NOCs face many challenges in parsing through alerts to identify actionable notifications and mobilize the right response team into action.

What is Opsgenie?

Opsgenie is an on-call and alert management and incident response solution to keep services always on. It empowers Dev and Ops teams to plan for service disruptions and stay in control during incidents. With over 200 deep integrations and a highly flexible rules engine, Opsgenie centralizes alerts, notifies the right people reliably, and enables them to collaborate and take rapid action.

Celebrities Explain WTF is Incident Management

Our friends Felicia Day, Steve Wozniak, and Brian Baumgartner help us explain what the heck incident management is. FireHydrant is the only comprehensive incident management platform that allows you to create consistency for the entire incident response lifecycle to focus on fighting fires faster. From alert to retrospective, tracking, communicating, and reporting on results: FireHydrant will automate the process so you can focus on resolution. Visit firehydrant.io to learn how you can manage the mayhem.