Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

3 questions to ask in the build vs buy debate for incident response tooling

As a former incident responder and now as a responder advocate for FireHydrant, I’ve seen the “build vs. buy” debate play out many times. In fact, I even supported the tool that former employers used for managing incidents for years before they decided to buy (more on that in a future blog post).

Getting started with severity levels

An incident can take many forms. It can look like a small issue that locks a few customers out of their accounts or a huge catastrophe that brings down your entire product for a full day. How you respond to the incident should vary based on the impact of the incident. And that’s where severity comes into play. Defined severity levels are crucial to any good incident management program.

FireHydrant is now more powerful across the entire incident lifecycle

FireHydrant has partnered with incredible companies to transform incident response inside their organizations, but our goal has always been to support the full incident lifecycle. That’s because we know that investing in good incident management can kickstart your reliability efforts when it includes both a streamlined incident response process that helps you recover faster and the ability to learn from incidents and then feed those insights back into your system.

New reports stress the importance of strategic incident management practice

Engineers have been managing incidents for as long as they’ve been building software, but the idea of incident management as a strategic practice in its own right is still finding its place. We’re starting to see big shifts in that area, though — more companies are dedicating headcount, resources, and tools to help them better prepare for, respond to, and learn from their incidents.

There's a better way: how an incident management tool helps you conquer response challenges

As a solutions engineer for FireHydrant, I speak with a wide variety of companies about their incident management programs — from start-ups with a handful of employees to large enterprise companies with thousands of engineers. Whether they’re looking to establish their incident management program or mature it, the same questions remain.

Identify and manage impacted customers with our new Zendesk integration

Customer support tickets are a key indicator of which customers are being actively impacted by an incident. Incident-related support tickets are an important component of impact assessment, incident prioritization, and effective stakeholder communications. FireHydrant's new Zendesk integration allows Enterprise tier users to: With our Zendesk integration you can streamline customer impact assessments and incident communications, resulting in reduced support response times and incident durations.

See the big picture with the Service Dependency Graph

Understanding the impact and scope of an incident when degradation occurs is critical for returning your service online. This requires modeling the many downstream and upstream relationships between your services. Our new Service Dependency Graph provides a shortcut – a way to surface dependencies quickly, understand the relationship between services, and determine the scope or impact of an incident.

We've made it even easier to manage your FireHydrant configuration with Terraform

Many of our customers use FireHydrant’s verified Terraform provider to track configuration changes, ensure consistency, and automate repetitive configuration tasks. Back in March we streamlined our Terraform provider support for service catalog configuration. Today we are releasing extensive Terraform provider improvements for configuring runbooks, task lists, service dependencies, incident roles, and more.

To require or not require (fields): that is the question

Required fields have been a hot topic at FireHydrant. Choose too many (or the wrong ones), and you unnecessarily annoy your team during an incident or encourage sloppy data entry that someone has to come back and clean up manually. Don't use them at all and risk insufficient data to efficiently propel an incident toward resolution.