The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Dexcom is more than a business. For its customers, the organization’s innovative continuous glucose monitoring platform provides them with a way to take control of their health and better manage their diabetes. Given the critical services Dexcom provides to its customers, their IT Operations teams have highly specific needs when it comes to the many tools and platforms, they rely on to keep their organization’s services up and running.
At a going away party from a job I was leaving a few years back, my VP of engineering told a story I didn’t even remember but that I know subconsciously shaped how I viewed my role on that team: Toward the end of my very first day at the company, there was some internal system issue, and with pretty much zero context, I pulled out my laptop, figured out what was going on, and helped fix the issue.
From a single on-call engineer hopping online to resolve a problem, to a massive cross-team effort that brings in even the most senior technical leadership (CTO, CISO, or CIO), incident response teams are lucky when they’re able to resolve issues before a customer is aware. But in the cases where there is customer impact, other stakeholders like sales and customer service need to be informed and updated as well.
If you have an on-call rotation, you want it to be a healthy one. But this is sort of hard to measure because it has very abstract qualities to it. For example, are you feeling burnt out? Does it feel like you’re supported properly? Is there a sense of impending doom? Do you think everything is under control? Is it clashing with your own private life? Do you feel adequately equipped to deal with the challenges you may be asked to meet? Is there enough room given to recover after incidents?
As failures are a common part of any system’s lifecycle - what would be the Root Cause Analysis for this type of problem? If you build and deploy a system, there are high chances that you'll have to deal with a failure in the near future. However, what matters is how you handle such failures. As an organization, you need to have pre-formulated strategies to handle failures as and when they occur.
We are super excited to announce a major milestone in our company history. 10 years ago, iLert started with a simple mission: help companies to increase their uptime and deliver a seamless digital experience. Every feature in iLert is built to help you to respond to critical alerts faster and increase your uptime.