Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Structuring Your Teams for Software Reliability

How well positioned is your team to ship reliable software? What are the different roles in engineering that impact reliability, and how do you optimize the ratio of software engineers to SREs to DevOps within teams? These questions can be hard to answer in a quantifiable way, but projecting different scenarios using systems thinking can help. Will Larson’s blog post Modeling Reliability does just that, and serves as inspiration for this article.

Got Game? Secrets of Great Incident Management

When his phone wakes him at two in the morning, operations engineer Andy Pearson knows it’s bad news. There’s a major server problem, and hundreds of client websites are down. Automated monitoring checks detected the outage within seconds, and paged the on-call engineer. This time, it’s Pearson in the hot seat. Pearson quickly confirms the issue is real and, escalates it to his boss, tech lead Lewis Carey.

Incident Response - how great companies do it

An incident response plan is a pre-devised action stratagem for IT teams on how to respond to critical IT events efficiently. As modern applications continue to grow in scale and complexity, there will be more people working on more interdependent systems, consequently, the question is not if a system will fail, but when, and how best to respond.

AIOps: What's in a name?

Since the term ‘AIOps’ came into use in the monitoring sector a couple of years ago, there has been much confusion about what it means. We hear from users asking if they need it – a difficult question given that the answer depends on how you define it. Since there isn’t a broadly accepted definition, a range of vendors now market their products as AIOps offerings, even though these products cross subsectors and may not be directly competitive.

Zendesk and PagerDuty: Helping Teams Work Together in Harmony

Your customers’ expectations are changing rapidly—they expect on-demand and personalized support whenever they interact with businesses. If one business doesn’t meet their expectations, they can easily order online from a different company, change service providers, or download a different app.

How to use SIGNL4 for availability monitoring of Enterprise Alert

Enterprise Alert is the leading enterprise-class software in automated communication and incident response providing push notifications, SMS text messages, voice calls and emails to deliver instant notifications. With two-way smart connectors, built-in duty scheduling, customizable escalation workflows, and remote actions there is not a worry that critical events are not received and are handled in a timely manner.

Don't Get Left Behind: Augmenting Decisions in DevOps With AIOps

DevOps is fast, glamorous and agile. It is key to keeping modern, fast-moving IT environments up and running. And it is no stranger to automation: DevOps has been relying on automation for many years now to ensure the rapid delivery of applications in this ever-changing landscape. Yet even the most agile and advanced DevOps teams cannot escape the growing complexity, scale and pace of the modern IT stack.