Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

On-call compensation models

Providing customers with a world-class and seamless user experience is critical for the success of any business. It is therefore important that you have a robust on-call strategy that optimizes the availability of the right subject matter experts, on-call engineers, and support engineers to resolve critical, user-impacting incidents as soon as possible.

It's a known issue - How Product Managers should deal with issue or feature related enquiries or feedback

I often hear folks in my network being triggered by interactions with product managers within their companies whenever they follow up on certain product-related issues. The triggering phrase invariably is “It’s a known issue”. And they often wonder, well if it’s a known issue, why on earth isn’t anything done about it?

The 3 musts for every FinTech incident management pro

Few industries have experienced such a disruptive whiplash as the financial services industry. With the dizzying encroachment of agile, innovative, and fearless fintechs coming to the fore, traditional banking institutions have had to completely rethink their business, revenue models, and customer engagement initiatives.

How to build a customer advisory board

Regardless of where you are in your product journey, it is impreative that you constitute a customer advisory board who can share perspectives into their business challenges so that you can gain insights on how to shape our road map, develop new features, formulate your vision and give you constant feedback on your product. So, how many customers should to include in a customer advisory board? Should you target higher level stakeholder or individual users?

Keeping PagerDuty Always On With Remote Incident Response

Earlier this month, many areas of the internet experienced a major incident caused by a router misconfiguration within a highly used service provider. This led to cascading service failures, causing widespread outages and disruptions for several well-known SaaS organizations. When the outage occurred, our teams at PagerDuty immediately noticed a global spike in events and incidents.

How to Improve On-Call with Better Practices and Tools

In the era of reliability, where mere minutes of downtime or latency can cost hundreds of thousands of dollars, 24x7 availability and on-call coverage to respond to incidents has become a requirement for the vast majority of organizations. But setting up an on-call system that drives effective incident response while minimizing the stress placed on engineers isn’t a trivial task.

What's New: Updates to Visibility Console, Event Intelligence, Analytics, and More!

We’re excited to announce a new set of product updates and enhancements to the PagerDuty platform! PagerDuty partners with organizations to help teams create efficiencies across IT organizations and protect customer relationships. These updates will help further improve your team’s ability to manage and reduce noise, automate critical response workflows, and quickly mobilize a response in order to mitigate disruptions across your digital operations when seconds matter.

Enabling the Stripe and Lyft Platforms Through Modern Safety Science

Jacob Scott is an experienced engineer and enthusiastic participant in the resilience engineering community, having spent time caring for the technology systems powering high-growth startups as well as unicorns like Lyft and Stripe. He is deeply passionate about how to apply learnings from modern safety science to real, complex socio-technical systems.

Root Cause Changes: are they the "Elephant in the NOC?" Here's the CTO Perspective

Ask any IT Ops practitioner what the first question they ask is when joining an emergency bridge call, and you’ll get the same answer: “What changed?” Our customers report that changes in their IT environments cause 60% to 90% of the incidents they see. Yet for some reason enterprises still find it difficult to deal with changes and correlate them to the IT incidents they may have caused.

New Integration: Create Zoom incident bridges automatically

Incident response doesn’t only happen in Slack, so today we’re happy to announce our integration with Zoom to create incident bridges automatically. Using the power of FireHydrant Runbooks, a Zoom meeting can be added with fully customizable titles and agendas based on your incident details. Let’s dive into how it works.