Operations | Monitoring | ITSM | DevOps | Cloud

Getting over on-call anxiety

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

Equitably distribute on-call responsibility and streamline incident response with Round Robin Scheduling

PagerDuty is excited to introduce Round Robin Scheduling. Round Robin Scheduling allows teams to equitably distribute on-call shift responsibilities amongst team members. Automatically assigning new incidents across different users or on-call schedules on an escalation level ensures that teams are resolving incidents as efficiently as possible. And, by balancing the workload across multiple users, there’s less risk of burnout.

The Human Side of Being On-call: 5 Lessons for Managing Stress, Anxiety, and Life While Being On-call

Within DevOps, we talk a lot about the on-call process—but what about the human side of being on-call? For example, what are effective ways of managing stress and anxiety during a shift? How can one manage life situations that make being on-call difficult—such as being responsible for watching the kids during an on-call rotation? And how can an empathic team culture help prevent burnout and turnover?

On-call by default

Like many SaaS businesses, we have an on-call rota to enable us to provide 24x7 cover if there are problems with incident.io. We have a 'pager' which will alert the relevant person if something unexpected happens in our app, so that they can investigate and fix it if needed. Note: This was adapted from an internal document we wrote about how we think about on-call at incident.io.

Ask Miss O11y: I Don't Want to Be On Call Anymore. Am I a Monster?

First, I’d like to say that pager duty isn’t something we should treat like chronic pain or diabetes, where you just constantly manage symptoms and tend to flare-ups day and night. Being paged out of hours is as serious as a fucking heart attack. It should be RARE and taken SERIOUSLY. Resources should be mustered, product cycles should be reassigned, until the problem is fixed.

Publish SIGNL4 oncall and alert information to your Grafana dashboard

Grafana is as open source analytics and interactive visualization application. You can connect different data sources to display chart and graphs or even trigger alerts. Wouldn’t it be great to add information about SIGNL4 alerts or about who is on call as part of your dashboard? In this case you immediately get an overview about open, acknowledged, and closed alerts per category. Of you can see wo it currently on duty. Here is an example with a who-is-on call, and an alert overview panel.