Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

A new channel per incident - helpful or harmful?

I caught the tail-end of a Twitter thread the other day which centred around the use of Slack channels for incidents, and whether creating a new channel for each new incident is helpful or harmful. It turns out this is a much more evocative subject than I thought, and since I have opinions I thought I’d share them!

Uptime + Squadcast Integration: Routing Alerts Made Easy

Uptime is a site monitoring solution used to reach various endpoints & notify users via push notifications when downtime is detected. It collects and stores downtime & response time data & which is then made available as reports to the users. If you use Uptime for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Uptime to the right users in Squadcast. The below steps will help you set up Uptime and Squadcast integration.

That Rogers Outage is Going to be More Expensive Than You Think

On July 8 of 2022, the Canadian telecom company Rogers Communications suffered a major outage that impacted most of Canada for almost two days. This wasn’t completely unprecedented (they’d had an outage in 2021 that impacted their wireless servers for several hours) but the breadth and severity of this one is going to end up costing them far, far more than it seems at first glance.

See the big picture with the Service Dependency Graph

Understanding the impact and scope of an incident when degradation occurs is critical for returning your service online. This requires modeling the many downstream and upstream relationships between your services. Our new Service Dependency Graph provides a shortcut – a way to surface dependencies quickly, understand the relationship between services, and determine the scope or impact of an incident.

August 2022 Update - Change duty status of colleagues, configurable duty notifications and revised password change

Our August update now allows administrators and team administrators to change the service status of other users in the portal. We also made service settings more granular and e.g. introduced the ability to turn off certain push messages when colleagues’ service statuses change. We have also revised the way of changing personal password or remote action PIN in the portal. All details are available in this article.

RESOLVE '22: The SOC and the NOC

In our RESOLVE ’22 event The SOC and the NOC, moderator and 3 Tree Tech VP of Cybersecurity Kris Taylor welcomed two esteemed guests to the stage: As Kris noted at the top of the event, we brought our panelists together to talk about “the culture of the network operating center (NOC) and security operations center (SOC).” Along the way, they discussed different philosophical and practical takes on the high-level topics of networking and security.

IHS Markit: Centralizing Incident Management With PagerDuty & ServiceNow

In today’s digital world, organizations are constantly undergoing change. They’re moving to the cloud and rolling out DevOps at scale—all in the name of driving innovation. But moving from a monolith to microservices can lead to applications becoming increasingly distributed. When problems arise, customers don’t care how many teams and services you have, or how complex your architecture is. They only care that your services work when they need them to.