Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How communication can make or break your incidents - incident.fm

In this episode, Pete and Lisa discuss why great communication is essential to the success of any incident management process. From keeping your wider team in the loop to minimise disruption, to using customer communication to strengthen your brand when things go wrong, the team share their experiences and top tips for having a transparent incident communication culture.

PagerTree Broadcasts

PagerTree broadcasts are a great way to send mass messages to multiple teams or users (think of an all hands on deck situation). When using the broadcasts feature you can send one way messages and optionally request a response. PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

How to Avoid Common Software Deployment Challenges

Software deployment is the manual or automated process of making software available to its intended users. It’s often the final—and most important—stage in the Software Development Lifecycle (SDLC). Software deployment is a three-stage process: All software deployments pose challenges, and issues can arise in any of the three stages.

The State of AIOps: A New Years' Message from Chief Moo Phil Tee

Well, that was fast! Another year has come and gone. It is safe to say 2020, ‘21 and ‘22 were exceptional, and only sometimes for good reasons. But I take heart in society’s steady progress toward digital maturity through it all. Nearly 100% of IT leaders say the pandemic accelerated their organization’s rate of digital transformation.

PagerDuty and RedMonk Present: What is Automated Diagnostics? Part 2 - Demo

Join PagerDuty’s Jake Cohen (Senior Product Manager) with RedMonk’s Kelly Fitzpatrick for a conversation and demo on automated diagnostics, process automation, and incident response. It’s all about automation helping first responders determine if there is an issue, which domain experts (if any) should be brought in to assist, and resolving the issue as quickly as possible. Part 2 of this 2-part video focuses on the concept and use case of automated diagnostics.

How communication can make or break your incidents

In this episode, Pete and Lisa discuss why great communication (both internally and externally) is essential to the success of any incident management process. From keeping your wider team in the loop to minimise disruption, to using customer communication to strengthen your brand when things go wrong, the team share their experiences and top tips for having a transparent incident communication culture.

How JPMorgan Chase uses Grafana and AI to monitor SLOs, SLIs, and more

For the team at JPMorgan Chase, the daily stakes of having a stable system are high. “We are in the business of making sure that trades are executed, and systems are stable and up and running for a positive client experience,” said Askari Imam, VP, Asset Wealth Management (Product and Integration Delivery).

A better way: 3 incident response areas prime for automation

By automating some rote parts of incident response, you reduce decision fatigue and help responders get to solving the problem faster with less stress. In this post, we talk about three areas of the incident response process that are prime for automation.