Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Demo Roundups! Automation Standardization (Workflows)

Join PagerDuty’s Solutions Consultants Bobby Zimmerman and Justyn Roberts to discover how combining technical automation with human-driven processes can reduce manual interventions, streamline repetitive tasks, and increase operational efficiency. Level up your digital operations expertise with PagerDuty Demo Roundups — a series of live, interactive webinars where you can deepen your knowledge in the Operations Cloud and see how PagerDuty can work for you. Each 1-hour session presents a hands-on demo that showcases PagerDuty’s capabilities in real-time followed by Q&A.

Site Reliability Engineer's Guide to Black Friday

It’s gotten to the point where Black Friday reliability prep has to start on…well Black Friday. This year, 32% of consumers in the US claimed that they were going to start their holiday shopping in July-October. Plus, Black Friday isn’t the only day eCommerce businesses have to worry about, now we have Cyber Monday, Travel Tuesday, and the thousands of Prime Days from Amazon.

Lessons from 4 years of weekly changelogs

Writing a meaningful update for customers every week has been held sacred at incident.io since we started the company. We've written over 200 of them in the past 4 years, and we recently celebrated going 2 years straight without missing a single a single week The numbers themselves are not the goal, but the consistency of this habit and what it represents for our customers and our team is very real, and special to me.

Operationalizing AI for IT operations

Advances in artificial intelligence are rapidly transforming the IT operations landscape. According to Enterprise Strategy Group, 85% of organizations use or plan to deploy AI across many functional areas, including IT operations. Among its many benefits, AI can help ITOps teams: AI has immense potential to transform how IT operations, service management, and infrastructure teams function. Adoption is the first step toward creating organizational change.

Did Delta's slow web performance signal trouble before CrowdStrike?

The CrowdStrike outage was a reminder of how quickly the dominoes can fall—especially when the foundation is shaky. Delta Airlines was hit harder than its competitors. While United and American Airlines were able to recover within days, Delta faced ongoing struggles, leading to the cancellation of 7,000 flights over five days.