Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Grafana OnCall is now generally available on Grafana Cloud, with a generous free tier

Today we’re announcing the general availability of Grafana OnCall on Grafana Cloud for all paid and free plans. A big part of delivering great software is ensuring the right people get the right information when the inevitable incidents occur. We want to help you do that with Grafana OnCall, an easy-to-use, developer-first on-call management tool that’s built on top of the Grafana stack you know and love.

Top tips to make Round Robin Scheduling successful for your team

You may have heard of Round Robin Scheduling before and thought to yourself, is this right for my team? Understanding how Round Robin Scheduling can be used and what teams it works best for is important when considering this method of on-call. Additionally, it comes with some pitfalls you’ll want to avoid, as well as best practices to adopt. In this blog post, we’ll share everything you need to know about Round Robin Scheduling within PagerDuty and how to get started.

No capes: the perils of being a hero-engineer

When I first started out as an engineer I really leant in to the idea of what’s often called “being a hero”; I would get to the office a bit early to make sure I could fix anything that had gone wrong overnight. I loved the camaraderie of someone outside engineering bringing their laptop over with a critical process broken for me to fix (even if I’d been the one to break it!). Being a hero feels really good for a while, but over time, it loses its shine.

Getting Started with Playbooks

It’s 2022: You’re good at your job, you’re maintaining modern systems, now you want to level up your team based on a solid foundation of their collective expertise. You want to standardize and centralize process documentation and make execution as easy and effective as possible so that everything runs smoothly, every time.

What's New: Updates to Event Intelligence, On-Call Management, Automation, Mobile, and More!

We’re excited to announce a new set of updates and enhancements to the PagerDuty platform. Recent updates from the product team include On-Call Management, Event Intelligence, and Mobile Products, to PagerDuty Community & Advocacy Events.

Intelligent Service Design

Hello and welcome to the fourth post in our EI Architecture series focusing on Intelligent Alert Grouping. Previously we have talked about how to train Intelligent Alert Grouping using incident merges (here) and how to configure your alert titles to improve default matching. In this post, we’re going to cover how service design can also impact your experience with Intelligent Alert Grouping as well as the PagerDuty app in general.

Reliability Through Automation for Your Infrastructure and Applications at Scale

As technology becomes more SaaS-based and organizations deploy applications in multiple clouds, there are requirements for more visibility into the cloud environment and better incident response and resolution automation capabilities. The two elements required to achieve this are integrations and workflows in an incident response software solution and effective experimentation, research, and testing in the cloud and on-premise.

DevOps Tools (All of the Tools Your Team Needs)

Wondering about DevOps Tools? We explain the best tools for every step of the DevOps development process. What are DevOps Tools used for? DevOps relies on effective tools to help teams manage the entire software development lifecycle. These tools can automate tasks, monitor applications, and facilitate sharing of information between teams.