Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How we've made Status Pages better over the last three months

A few months ago we announced Status Pages – the most delightful way to keep customers up-to-date about ongoing incidents. We built them because we realized that there was a disconnect between what customers needed to know about incidents, and how easily accessible this information was. For example: As we built them, we focused on designing a solution that powered crystal-clear communication, without the overhead — all beautifully integrated into incident.io.

Top 5 Resiliency Trends of 2023

In today’s world, resilience is no longer a conditioned desire or methodology to try but has become a necessity for sustained success in software development and IT operations. As DevOps and Agile teams keep moving forward to cross boundaries, come up with new methodologies, and drive innovation, it is now important to have the ability to quickly recover from failures, adapt to changing conditions, and maintain high performance under pressure.

Twelve Key Learnings from PagerDuty People Team's Generative AI HackWeek

Sometimes innovation requires ideas unconstrained by traditional structures and removed from day-to-day responsibilities. It was in this spirit that PagerDuty’s People HackWeek–a friendly competition to explore how generative AI might impact the future of HR–was born.

The balancing act of reliability and availability

As consumers, we expect the products and software we buy to work 100% of the time. Unfortunately, that’s impossible. Even the most reliable products and services experience some disruption in service. Crashes, bugs, timeouts. There are a ton of contributing factors, so it's impossible to distill disruptions down to a single cause. That said, technology is becoming more and more sophisticated, and so is the infrastructure that supports it.

A better Grafana OnCall: Delivering on features for users at scale

Enterprise IT is just a different animal. Whether it’s operating at scale, undertaking massive migrations, working across scores of teams, or addressing tight security requirements, engineers at these organizations can face different obstacles than their counterparts at smaller organizations and startups.

Transformation in Travel: Our Q&A with TUI's Head of Technology

The travel industry is experiencing an unprecedented surge in demand from people seeking adventure and eager to explore new destinations. Given an abundance of choice and the desire to have a personalized experience, customers are turning to tour operators to remove complexity from planning so they can focus on the holiday and not on the process of planning it.

Implementing Zero Trust: A Practical Guide

According to the Harvard Business Review, 2022 saw more than 83% of businesses experiencing multiple data breaches. Ransomware attacks, in particular, were up 13%. With cyber security being such a hot topic for business owners, it’s no surprise implementing a zero trust policy has become so important. In this guide, we’ll cover how to implement zero trust and why it’s important for your business to do so. Let’s get started.

Mastering Incident Resolution: Process and Best Practices

For DevOps and IT teams, incident resolution is an important aspect of predicting, resolving, and documenting service disruptions. It refers to the part of the incident management process where responders restore the service to functioning. Modern technology has come a long way, but it’s not without flaws. When businesses suffer from cyber-attacks, system crashes, and network outages, it impacts the organization on many levels.