Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How We Built and Use Runbook Documentation at Blameless

Even if you don’t notice, you are executing runbooks everyday, all the time. When you have an incident in your day-to-day operations, you follow a series of ordered and connected steps to solve it. For instance, if you lose your internet connection, you will follow a series of steps to resolve that issue: This could be different depending on your method, but you have the idea.

IT Trends You Don't Want to Miss

The COVID pandemic has redefined the workplace and accelerated the process of digitization for many. Organizations are migrating to systems that are flexible, distributed and resilient. Per Gartner, IT spending will reach $3.9 trillion worldwide in 2021. IT teams will be channeling investments into enterprise software as remote work becomes essential. Systems that support remote work will see a growth of 8.8 percent this year.

Why we went passwordless on our new product

Passwords are dying. The cost of creating and maintaining passwords is becoming untenable. Which can be seen in the rise of users logging in with social products and developers outsourcing their pain to Auth0 and the likes. We decided to sidestep the password based authentication and went passwordless on our new product. Read on to see how you can go passwordless too.

Using OnPage to Deliver Exceptional Customer Support

The OnPage Customer Support team consists of knowledgeable, friendly technicians that offer 24/7 assistance. Support recognizes the importance of client relationships and always aims to achieve maximum customer satisfaction. The OnPage incident management system is at the center of Support’s quality service delivery. OnPage triggers instant, critical mobile alerts to technicians whenever customer-initiated tickets are created.

SRE as Organizational Transformation: Lessons from Activist Organizers

In the software industry’s recent past, the biggest disruptive wave was Agile methodologies. While Site Reliability Engineering is still early in its adoption, those of us who experienced the disruptive transformation of Agile see the writing on the wall: SRE will impact everyone. Any kind of major transformation like this requires a change in culture, which is a catch-all term for changing people’s principles and behaviors.

Introducing Incident Timer

We’re excited to announce Incident Timer - a “days without an incident” timer for software teams to keep track of major engineering incidents. As the people behind Spike.sh, we keep discussing how to build a culture of reliability with our customers. We loved the idea of safety/accident timers in factories which kept track of major accidents. It's a simple and elegant way to keep safety on everybody’s minds.

What is DevOps?

What is DevOps? DevOps is a term for a cluster of concepts that has become a movement, “a cross-disciplinary practice dedicated to the study of building, evolving and operating, rapidly-changing resilient systems at scale.” (Jez Humble) The definition of DevOps is not agreed upon by everyone because of the complex processes attached to the term, however, the benefits to teams are universally agreed upon.

Accelerate your logs investigations with Watchdog Insights

If you’re investigating an incident, every minute means degraded performance or even downtime for customers. The causes of an issue often come from parts of your systems and applications that you would not think to check, and the sooner you can bring these to light, the better.

SRE2AUX: How Flight Controllers were the first SREs

In the beginning, there were flight controllers. These were a strange breed. In the early days of the US Manned Space Program, most american households, regardless of class or race, knew the names of the astronauts. John Glen, Alan Shepard, Neil Armstrong. The manned space program was a unifying force of national pride. But no-one knew the names of the anonymous men and later, women, who got the astronauts to orbit, to the moon, and most importantly, got them back to earth.

What Our Customers Say About the PagerDuty Platform

As noted in this blog a couple of weeks ago, we recently commissioned IDC to interview PagerDuty customers to quantify the business value they gain from our platform. It found that, on average, the 14 PagerDuty customers interviewed gained annual benefits of $3.48 million, a three-year ROI of 795%, and a payback period of just over two months.