Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to speed up incidents with a lot of cooks in the kitchen

In one of our recent webinars we discussed a substantial challenge IT Ops teams face in today’s complex IT environments: defining and clearly communicating incident/operational roles and processes, in an effort to create a well-coordinated incident management lifecycle. This lifecycle is essential for restoring service as quickly as possible when disruptions occur. Following are the highlights of that discussion, also recently published in an ApmDigest article.

9 Barriers to DevOps Implementation

The DevOps model unites development and IT operations to create a powerful organizational culture to achieve business goals more efficiently. Formerly siloed teams can now collaborate continuously to build more robust products, with increased confidence, and achieve business goals faster. The model has the power to transform operations, but there are barriers to DevOps that must be overcome first.

Why Your APIs Should Fly First Class

Picture yourself flying first class. You board the plane first, you get champagne, and you feel as though you’re the most important. Why not treat your APIs the same way? In this talk, FireHydrant CEO and Co-Founder, Robert Ross (a.k.a @bobbytables) shares why putting your APIs first can be a game-changer for your business and how this mindset shaped the way FireHydrant was built.

How to Build an SRE Team with a Growth Mindset

The biggest benefit of SRE isn’t always the processes or tools, but the cultural shift. Building a blameless culture can profoundly change how your organization functions. Your SRE team should be your champions for cultural development. To drive change, SREs need to embody a growth mindset. They need to believe that their own abilities and perspectives can always grow, and encourage this mindset across the organization.

How to get mobile push notifications from Spike.sh

When an issue happens in your software in production, the channel to send the alert on depends on multiple factors. If it's a critical issue requiring immediate attention, you should alert the team member via phone call. But not all issues require a phone call, and in fact it may become annoying if your phone keeps ringing for minor issues. This is where other channels like SMS, Slack and mobile push notifications come in.

Alert Fatigue and Your Health

As an on-call engineer, you might deal with the day-in, day-out occurrence of alerts. These alerts may come from your alerting provider (PagerDuty, OpsGenie, etc.), Slack notifications telling you the site is down, or the ever concerning text message "Hey, is the site down?". These alerts elicit reactions that range from "shit" to "again?" and in many cases, both.

How We Built and Use Runbook Documentation at Blameless

Even if you don’t notice, you are executing runbooks everyday, all the time. When you have an incident in your day-to-day operations, you follow a series of ordered and connected steps to solve it. For instance, if you lose your internet connection, you will follow a series of steps to resolve that issue: This could be different depending on your method, but you have the idea.

IT Trends You Don't Want to Miss

The COVID pandemic has redefined the workplace and accelerated the process of digitization for many. Organizations are migrating to systems that are flexible, distributed and resilient. Per Gartner, IT spending will reach $3.9 trillion worldwide in 2021. IT teams will be channeling investments into enterprise software as remote work becomes essential. Systems that support remote work will see a growth of 8.8 percent this year.