The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Knowing who is in charge helps teams avoid confusion about who to turn to during a crisis, allowing them to focus their efforts where needed. When the pressure is on, an incident commander should have an established response plan to ensure that responders act quickly and coordinate efficiently, and with actionable insights this can be made possible.
I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was especially excited about was the opportunity to see Honeycomb’s incident retrospective process from the inside. I wasn’t disappointed! The first retrospective I took part in was for our ingestion delays incident on September 8th.
Moogsoft pioneered AIOps, essentially inventing the market 10 years ago. It is worthwhile revisiting why we did that to understand where we are going. My background is as the founder and inventor of Micromuse Netcool, and the RiverSoft’s OpenRiver technology.
Implementing integrations without a mountain of technical debt can be challenging. But it doesn’t have to be all bugs, burn out, and outages when shipping integrations at a high volume. We’ve unlocked a pattern at FireHydrant to rapidly build and release integrations without swiping the technical debt credit card each time — and that gave us a fastlane to building premier integrations.
At ilert customers are already benefitting from our easy to setup private or public status pages and auto generated SLA uptime graphs for their business services. However, we decided to push the graph topic a bit further with custom metrics. Using ilert metrics customers can showcase additional business data and insights into their services on their status pages.