Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

FireHydrant Tasks provide turn-by-turn navigation during an incident

An incident has been declared and your runbook has fired. Everyone is gathered in your Slack channel, the tickets are opened, and roles are assigned. Now what? This is when most teams manually update status pages and kickoff investigation streams using a patchwork of tribal knowledge and supporting playbook documents.

Why SREs Need to Embrace Chaos Engineering

Reliability and chaos might seem like opposite ideas. But, as Netflix learned in 2010, introducing a bit of chaos—and carefully measuring the results of that chaos—can be a great recipe for reliability. Although most software is created in a tightly controlled environment and carefully tested before release, the production environment is harsher and much less controlled.

Episode 5: Mooving to... Practical Postmortems

Episode 5, Mooving to… Practical Postmortems covers how to leverage postmortems to effectively learn from failure. Postmortems are a commonplace reference and are now considered a best practice in most modern engineering teams. However, there’s still a lot of confusion on what postmortems should be – and more importantly, what they should NOT be. Thom Duran, Senior Manager of Productivity from Panther walks us through all that and more in the latest Mooving To.. episode!

What should you choose? Docker Swarm vs Kubernetes

Since the introduction of containerisation by Linux many years ago, maturity has shifted from the traditional virtual machine to these containers. These tools have made application development much easier than the initial process. Docker Swarm and Kubernetes came into action when the number of containers increased within a system, they helped orchestrate these containers. A question that arises is, which one is the better option?

Top Incident Response Metrics & How to Use Them

Two categories a software organization should always strive to improve in are: Data analysis is one way that your organization can improve the efficiency of incident management and overall application quality. However, the questions remain – which metrics should be collected? How can analysis of these metrics facilitate these improvements? Read on to hear about five key metrics essential to incident response.

Our fully-redesigned incident response experience delivers a more intuitive workflow

Today we’re releasing fully redesigned Slack and Command Center experiences for FireHydrant so anyone on your team can intuitively navigate the incident response process — in the app or on the web. There are many things you can do ahead of an incident to help things run smoothly: design and document your process, automate predictable steps, train the team, and run drills.

SecOps tools - SecOps & incident management for 2022.

Importance of secOps tools – The threats in the cyber world are becoming more and more complicated and sophisticated with each passing day, while the rapid expansion of digital operations, with more nodes, networks, and servers has resulted in more vulnerabilities. This situation demands efficient SecOps teams as well as practices so that threats are thwarted, and networks and data are always protected. What is SecOps & Best SecOps tools?