Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Why More Incidents Are Better

Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be “zero.” After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing actual incidents by as much as possible is a noble goal. However, it’s important to recognize that incidents aren’t an SRE’s number one enemy.

Why Operational Maturity Helps Businesses Reduce the Great Resignation Trend

The past few years have led to fundamental business and cultural shifts for both companies and employees. Covid-19 has brought opportunities for companies who invested early in digital operations, while others struggled to maintain the status quo. The latter gave rise to record employee burnout, and what is now commonly referred to as the Great Resignation.

3 mistakes I've made at the beginning of an incident (and how not to make them)

The first few minutes of an incident are often the hardest. Tension and adrenaline levels are high, and if you don’t have a well-documented incident management plan in place, mistakes are inevitable. It was actually the years I spent managing incidents without the right tools in those high-tension moments that inspired me to build FireHydrant. I built the tool I wished I’d had when I was trying to move fast at the start of incidents.

Better Data for Public Health: How Nexleaf and PagerDuty are Monitoring Healthcare

Having a reliable power source is something many of us take for granted. It is particularly important for healthcare facilities to have a consistent, reliable power source to ensure that vulnerable patients – specifically those who rely on electricity to sustain their lives – are not disrupted. In rural Sub-Saharan Africa, however, it’s estimated that only about 28% of hospitals have reliable electricity.

What It Means to Be an Incident Commander

Leadership is essential in an organization. Establishing a leadership hierarchy helps teams avoid getting confused about who to turn to with questions and concerns, allowing them to focus their efforts where needed. High-quality leadership is vital to success but becomes even more important when the pressure to resolve an issue with minimal downtime is turned up.

Engineering Manager from a non-STEM background?

There is a long list of requirements a hiring manager looks at before hiring an Engineering Manager, there needs to be a balance between technical and leadership skills to perform well in the position. Engineering Manager roles differ from company to company. It is hard to list what a day in an engineering manager’s life looks like.

Uncovering the mysteries of on-call

For the vast majority of organisations, some form of round-the-clock cover is critical to successful business operations. On-call is an essential part of an effective incident response process, yet there is no commonly accepted playbook on how to most effectively structure and compensate on-callers. We ran a survey to uncover the mysteries of how on-call works in organisations of different shapes and sizes around the world.

What is Live Call Routing?

If there’s one essential thing we’ve learned from being in the business of digital operations for more than 13 years, it’s that every business has a unique approach to building resilience with its bespoke tech stacks and processes. Many PagerDuty customers around the world are starting to provide direct access to their on-call teams with Live Call Routing (LCR).