Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Business Continuity Planning and Effective Communication - by Laura Toplis

With many companies utilizing remote-working during the COVID-19 pandemic, effective communication is more important than ever. Unfortunately, being in the middle of responding to a global pandemic will not prevent your organization from suffering from other business disruptions. Likely disruptions you may face are: Cyber/ phishing attacks – these attacks can cripple your regular communication methods such as email, or may exploit ineffective communications to extract illegal payments.

Tips for Modern NOCs - Correlating Incidents to the IT Changes that Caused Them

Every NOC engineer will tell you that the first thing they look for in an outage is “what changed?”. And they are right to look. While every organization is unique, Gartner reports that on average about 80% of IT incidents today are caused by changes in infrastructure and/or software.

OnPage Overrides Silent Switch on iOS and Do Not Disturb Mode

Since its inception, OnPage has been dedicated in providing a powerful critical alerting solution. This mission continues in 2020, as OnPage is pleased to introduce its ability to override the silent switch and Do Not Disturb (DND) mode on iOS. The latest advancements ensure that tasked recipients always receive high-priority, OnPage audible alerts, regardless of their current iPhone settings.

When Incidents are not investigated, Problems await

Incident and Problem Management are two very different issues in IT service management that are unfortunately often used interchangeably. On the surface, it might just seem like a matter of terminology. But, what if you get to know that one is a small hiccup and the other could dent your entire quarterly or annual results?

On-call On-boarding Checklist

And it starts with the company culture. Irrespective of how small or large your team is, it’s wise to invest some time in creating a good on-call onboarding plan. A humane on-call is the mark of a good engineering culture. Being on-call means that you’re expected to be reachable for any issues that may occur during your shift. It’s easy to lose any and all motivation by just anxiously anticipating that mid-dinner ping.

Creating powerful automations with n8n and Mattermost

Tanay is the Head of Developer Relations at n8n. He has published books on WebVR, virtual assistants on Raspberry Pi, and FirefoxOS. He has been listed in the about:credits of the Firefox web browser for his contributions to the different open source projects of the Mozilla Foundation. I’ve been involved in the DevOps world for a while and yet I finished reading The Phoenix Project only recently. The book piqued my interest in how teams execute their incident response playbooks.

Announcing Our Series A

It’s Friday at about quitting time, and my plans for the evening involved a great cocktail, hanging out with friends, and maybe continuing to binge The Office. Sadly, there was a problem. Our alerting system detected an enormous and immediate spike in errors. The error description was along the lines of “table ‘servers’ does not exist” and thousands of customers couldn’t use a large cloud provider’s services.

Driving Real-Time ChatOps With PagerDuty and Microsoft Teams

With over 75 million daily active users, it’s safe to say Microsoft Teams is essential to many global businesses. On top of that, Microsoft CEO Satya Nadella recently shared that Microsoft saw 200 million meeting participants in a single day this month. While Microsoft Teams’ explosive growth can be tied to recent spikes in remote work, many enterprises have relied on Teams to connect people across the globe for quite some time.