Latest Posts

New related incidents functionality brings order to the chaos of highly complex incidents

Jun 14, 2023 By Joel Smith In FireHydrant

We’ve all been there. You’re working through some rather frustrating blockers during an incident only to discover that you don’t own the dependency at fault. Or, you’ve been pounding away at an issue when a fellow engineer reaches out and asks if your service is affected by some particularly gnarly database failure they’re seeing. But then what? Do you merge efforts and work in parallel or head for a coffee break while the issue gets attacked upstream?

Read Post

FireHydrant

Read more about New related incidents functionality brings order to the chaos of highly complex incidents

Using PostgreSQL advisory locks to avoid race conditions

Jun 1, 2023 By David Celis In FireHydrant

The first moments of incident response can be among the most crucial, which in turn can also make them among the most stressful. There are many ways to ensure incidents are kicked off smoothly, but a recent focus of ours was to ensure they could be kicked off quickly. After all, the faster you're able to start mitigating your incident, the more successful you'll be!

Read Post

FireHydrant

Read more about Using PostgreSQL advisory locks to avoid race conditions

Use incident cycle time to optimize your incident response process

May 31, 2023 By Jouhné Scott In FireHydrant

Although the causes and solutions for incidents vary widely, most incidents follow a similar timeline from declaration to resolution. We call the period of time it takes to move from one phase or milestone of an incident to the next cycle time.

Read Post

FireHydrant

Read more about Use incident cycle time to optimize your incident response process

The fastest and most robust path to incident declaration from monitoring tools

May 18, 2023 By Joel Smith In FireHydrant

Here’s a crazy question: why do we still require a human to manually declare an incident for the things that we know are incidents? If we have enough confidence to build SLOs and high-severity alert routes for these specific scenarios, why are we still asking a human to confirm it’s an incident and get the assembly process in motion? Isn’t that just another button to push when we could be problem solving instead?

Read Post

FireHydrant

Read more about The fastest and most robust path to incident declaration from monitoring tools

Status page best practices

May 10, 2023 By Daniel Condomitti In FireHydrant

Although some organizations may hesitate to publicly announce when they have an incident — afraid that acknowledging outages will scare customers away — the opposite is often true. When you proactively communicate with your customers, even during bad times, you have the opportunity to not only build trust but also buy grace during the incident.

Read Post

FireHydrant

Read more about Status page best practices

Assembly time is where you have the most control of an incident

May 4, 2023 By Robert Ross In FireHydrant

The FDNY EMS Command responds to more than 4,000 calls per day. They range from car accidents to building fires to cats stuck in trees, and responses vary accordingly. Sometimes they might take hours, sometimes they take just a few minutes. With such unpredictable conditions, the FDNY focuses on improving what they call “response time.” That’s the amount of time between a 911 call being made and emergency responders arriving on the scene. This might sound familiar.

Read Post

FireHydrant

Read more about Assembly time is where you have the most control of an incident

How to get started with incident management metrics

May 2, 2023 By Jouhné Scott In FireHydrant

Tracking incident metrics can help you discover patterns in the causes and costs of incidents and help you understand brittle parts of your organization. We've seen them help teams zero in on things like: But it can be intimidating to get started. Do you really need metrics if you're a small team or just beginning to formalize your incident management program? I say yes. The key is to start with something manageable and grow.

Read Post

FireHydrant

Read more about How to get started with incident management metrics

Forgot to declare an incident? Add it retroactively in FireHydrant.

Apr 27, 2023 By Joel Smith In FireHydrant

Have you ever quickly worked through an issue with your team and later thought, “Huh. That probably should have been an incident.” It happened to us just a few weeks back. After one of our engineers surfaced a failed build, a few folks chimed in to problem solve and within 30 minutes things were up and running like normal. But we probably should have declared an incident.

Read Post

FireHydrant

Read more about Forgot to declare an incident? Add it retroactively in FireHydrant.

Two data-backed ways to resolve incidents faster

Apr 18, 2023 By Chris Kelly In FireHydrant

Incidents are expensive — and only getting more so. In fact, more than 98% of large companies and 47% of small- and medium-size companies say a single hour of downtime costs at least $100,000, according to the 11th annual Hourly Cost of Downtime Survey.

Read Post

FireHydrant

Read more about Two data-backed ways to resolve incidents faster

The why and how behind running incident response game days

Apr 11, 2023 By Jouhné Scott In FireHydrant

In any high pressure situation, the key to fast action is preparedness. And that’s true when it comes to incidents, too. Documenting and training your team on your incident response processes is essential to ensuring a coordinated and efficient response effort. And training sessions, or game days, as they’re sometimes called, are one way to get everyone up to speed.

Read Post

FireHydrant

Read more about The why and how behind running incident response game days

Operations | Monitoring | ITSM | DevOps | Cloud

New related incidents functionality brings order to the chaos of highly complex incidents

Using PostgreSQL advisory locks to avoid race conditions

Use incident cycle time to optimize your incident response process

The fastest and most robust path to incident declaration from monitoring tools

Status page best practices

Assembly time is where you have the most control of an incident

How to get started with incident management metrics

Forgot to declare an incident? Add it retroactively in FireHydrant.

Two data-backed ways to resolve incidents faster

The why and how behind running incident response game days

Monthly Archive

Follow Us