Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Align ServiceOps with incident context to meet ITOps goals

ServiceOps is a technology-enabled approach that unifies IT operations and IT service management (ITSM) teams to improve incident management. In a recent survey of more than 400 global IT leaders by Enterprise Management Associates (EMA), 96% of respondents reported positive results from implementing the approach. Adoption rates are high: 75% have either an active effort or a formal initiative to streamline collaboration between ITSM and ITOps teams.

Round Robin escalation policies: do's and don'ts

The concept of Round Robin comes from sports. And it has nothing to do with anyone called Robin, but the french word ruban (ribbon). In a Round Robin tournament, all participants face each other by taking turns. When applied to on-call schedules, a Round Robin escalation policy means that responders assigned to a level will take turns responding to alerts. When is this strategy useful and when isn’t?

Part I: #3 Virtual Meetup Rundeck by PagerDuty Asia Pacific OSS Community.

Part I:#3 Virtual Meetup Rundeck by PagerDuty Asia Pacific OSS Community. Customer Success Story: Samuel Kanagaraj (SRE Lead @ Telstra). Automate with Rundeck by PagerDuty! Explore the transformative power of automation through real-world success stories and expert insights. Hear firsthand from Samuel Kanagaraj, SRE Lead at Telstra, as he shares how automation has revolutionised their operations.

Part II: #3 Virtual Meetup Rundeck by PagerDuty Asia Pacific OSS Community.

Part II:#3 Virtual Meetup Rundeck by PagerDuty Asia Pacific OSS Community. Customer Success Story: Jared Vern & Christopher Gadd (Automation Engineers @ One New Zealand). Automate with Rundeck by PagerDuty! Explore the transformative power of automation through real-world success stories and expert insights. Jared Vern and Christopher Gadd, Automation Engineers at One NZ, discuss their experiences and the impact of automation on their workflows.

What is an Incident Timeline and How Do You Create One?

Incidents are unavoidable in software development and IT. As a Site Reliability Engineer (SRE), one of the tools you’ll use frequently is an incident timeline. The incident timeline provides a real-time report on any incident, including alerts, system updates, issue severity changes, manual chat entries, and more.

SRE vs. DevOps vs. Platform Engineering

The age of information technology has rapidly expanded to include a wide range of necessary roles to manage and optimize operational frameworks. Site Reliability Engineers (SREs), Development Operations (DevOps), and Platform Engineers have become invaluable within this digital landscape. Here, you’ll learn more about each role, how they differ, and what they bring to the table.

Onboarding yourself as an engineer at incident.io

At incident.io we use infrastructure as code for configuring everything we can, and we feel that there’s no reason we should exclude our own product from that. As well as configuring things like Google Cloud Platform, Sentry and Spacelift via our infrastructure repo, we also configure incident.io. On your first day as an engineer here, the first PR that you make is to our infrastructure repo.

Runbooks vs Playbooks: Differences & How to Choose

Are you documenting your incident response process, and are unsure which you should be writing—a runbook or a playbook? Could these be two names for the same kind of document? Read on to learn about two different and complementary structures: playbooks and runbooks. The two are used in tandem, and because the terms are sometimes used interchangeably, they can be mistaken for one another.

Live Call Routing with Squadcast: Helping Teams Achieve Faster Resolutions

This is a recording of our webinar on how Squadcast's Live Call Routing is revolutionizing incident response for teams. In this informative session, you'll learn: The hidden costs of traditional incident reporting methods How a dedicated phone line streamlines incident communication Squadcast's easy-to-use, no-code setup for Live Call Routing Real-world case studies: See how companies have drastically improved their MTTR About Squadcast.

On-Call Life: Setting Expectations

Imagine this: You’ve just been offered a new job in tech. Maybe it’s your first job right out of college, and you’ve only heard of being on-call in passing conversations up until this point. Or, perhaps you’ve been in tech your whole life but never had to be on-call until today. Or, maybe you’re contemplating whether on-call is for you because your company is dangling some extra cash (because, who doesn’t like extra money!).