Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Top 5 Resiliency Trends of 2023

In today’s world, resilience is no longer a conditioned desire or methodology to try but has become a necessity for sustained success in software development and IT operations. As DevOps and Agile teams keep moving forward to cross boundaries, come up with new methodologies, and drive innovation, it is now important to have the ability to quickly recover from failures, adapt to changing conditions, and maintain high performance under pressure.

Twelve Key Learnings from PagerDuty People Team's Generative AI HackWeek

Sometimes innovation requires ideas unconstrained by traditional structures and removed from day-to-day responsibilities. It was in this spirit that PagerDuty’s People HackWeek–a friendly competition to explore how generative AI might impact the future of HR–was born.

The balancing act of reliability and availability

As consumers, we expect the products and software we buy to work 100% of the time. Unfortunately, that’s impossible. Even the most reliable products and services experience some disruption in service. Crashes, bugs, timeouts. There are a ton of contributing factors, so it's impossible to distill disruptions down to a single cause. That said, technology is becoming more and more sophisticated, and so is the infrastructure that supports it.

The Unplanned Show, Episode 13: Jake Cohen and Generative AI for Automation

On the heels of the public beta opening for AI-generated runbooks in Runbook Automation, we asked Jake Cohen from product management about how this is different from generating code with something like chatGPT or various AI-powered code completion tools available. We get into prompt engineering, managing output quality, and privacy and security concerns.