Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to Classify Incidents

Incident classification is a standardized way of organizing incidents with established categories. Incidents can include outages caused by errors in code, hardware failures, resource deficits — anything that disrupts normal operations. Each new incident should fit into a category dependent on the areas of the service affected, and in a ranking of the severity of the incident. Each of these classifications should have an established response procedure associated with it.

SLOs for AWS-based infrastructure

In our latest two-part series blog, Gigi Sayfan, author of “Mastering Kubernetes”, discusses managing complex infrastructure on AWS with an eye towards SLOs (service level objectives). Though there are many ways to discuss the management of infrastructure, in this two-part series, he covers SLOs for AWS, Observability on AWS, Quotas Limits, and Optimizing cost on AWS and in the second part, he uses the lens of Kubernetes to compare and contrast compute infrastructure on AWS with Kubernetes.

Improve Customer Experiences & Collaboration Between Support and Engineering With Bidirectional Communication

We are delighted to announce our new PagerDuty integration for Salesforce Cloud. This integration empowers Customer Service, Engineering, and IT teams to proactively resolve customer issues in real time by improving communication and collaboration.

Google Cloud OnAir with CEO Ashar Rizqi: Benefits of Cloud Infrastructure

CEO Ashar Rizqi had the pleasure of being a guest on Google Cloud OnAir, a Google Cloud Customer Interview Series. Ashar and interviewer Jimmy Sopko discussed how Blameless has extended our runway using Google Cloud and Google Kubernetes Engine and how the team cultivates a culture of site reliability in a changing world.

Incident Page Updates

Here at FireHydrant we are always looking for ways to improve and simplify incident management, today we’re happy to announce a set of changes to the incident and retrospective pages to further simplify the incident command center. To make it easier to stay up to date on the status of your incident, we have made the incident timeline permanently viewable on your Incident Command Center. You can adjust the width of your timeline to ensure you can see the most important information at all times.

PagerDuty Integration Updates

In an effort to make it even easier to open incidents, FireHydrant will now let you open an incident from Slack in a single click. When an alert is ingested into FireHydrant a message will post to a channel of your choosing to open an incident. When the incident is opened it will pull in all the data from the PagerDuty alert and configure your incident with that data. Now you can go from an alert firing in PagerDuty to an open FireHydrant incident with all of your automated process in under 5 seconds.

Key Fortinet and Flowmon Integrations: Automated Incident Detection and Response

Flowmon has recently joined Fortinet’s Open Fabric Ecosystem by integrating with FortiGate and FortiSIEM. This cooperation brings automated system for threat detection and response, blocking security risks in their infancy, and giving time to administrators to carry out forensics.