Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.

How we achieved pixel-perfect polish during our Status Pages launch

A few months ago, we released Status Pages. This project was quite different from anything we’ve approached before, given that: And our goals were a departure from one's we had set in the past: With this in mind, we worked closely with our designer throughout the process of building Status Pages. Here is how we approached it and a few lessons we learned along the way!

Catalog vs. Thanos: Who came out on top?

Catalog is really, really powerful. To prove it, our latest product went up against the almighty Thanos and won decisively. Don’t believe us? Just look at how unscathed Catalog was once the dust settled: All jokes aside, we spent months building out what, we think, is one of the most capable products on the market today. Designed to be a map of everything that exists in your organization Catalog can meaningfully help you level up your incident response.

Effective incident escalations

In the ever-evolving digital landscape, every organization must confront its fair share of incidents. Regardless of the sector or size, one common thread weaves through them all: the need for effective incident management. A crucial part of this management is incident escalation, a topic on which we've had many discussions with various companies.

Synchronizing mental models

In the heat of an incident, having a clear and shared understanding of what’s going on is absolutely crucial to effective response. But often what actually happens is that people involved in incidents build their own picture and narrative of the event, shaped by their own expertise, their past experiences, and what they’re seeing and hearing as the incident develops. The pictures and perspective people build is often referred to as a mental model.

Announcing Catalog - the connected map of everything in your organization

One of the most painful parts of incident response is contextualizing the problem and understanding how and where it fits within your organization. If responders are unable to answer basic questions such as: Then you waste valuable time talking to the wrong people or solving the wrong problems — ultimately extending impact and hurting your response. It’s a common issue that, up until now, didn’t have a clear solution or workaround.

How our product team use Catalog

We recently introduced Catalog: the connected map of everything in your organization. In the process of building Catalog as a feature, we’ve also been building out the content of our own catalog. We'd flipped on the feature flag to give ourselves early access, and as we went along, we used this to test out the various features that Catalog powers.

Services are not special: Why Catalog is not just another service catalog

As you may have already seen, we’ve recently released a Catalog feature at incident.io. While designing and building it, we took an approach that’s a tangible departure from a traditional service catalog. Here’s how we’re different, and why.

Using DORA metrics Mean Lead Time for Changes to deliver iterations faster

Here's what you can expect to learn from this article: Raise your hand if you like shipping changes quickly. (Yes, let's assume that everything you're shipping has value and isn't a vanity project). Chances are, you, the person reading this now, agreed with the above. When you start on a project, big or small, you want to keep any changes moving along and avoid getting stuck. The less time between the beginning and end of a project, the faster you can shift your focus to other things.