Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Setting up Runbooks in Squadcast | SRE Best Practices | Squadcast

A Runbook is a compilation of routine procedures and operations that are documented for reference while working on a critical incident. Sometimes, it can also be referred to as a Playbook. From this video, learn to create, attach, reference and mark progress for incident resolution using Runbooks.

What's New: Updates to Incident Response, PagerDuty Process Automation, Integrations, and More!

Following another successful PagerDuty Summit, development continues across several areas of the product. We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent updates from the product team include Incident Response, PagerDuty® Process Automation and PagerDuty® Runbook Automation, Partner Integrations & Ecosystem, as well as Community & Advocacy Events updates.

Release Notes: Process Automation and Rundeck OSS 4.4.0

Product managers Forrest Evans and Jake Cohen show off new features and enhancements in PagerDuty Process Automation and Rundeck Open Source version 4.4.0. Version 4.4.0 features two new plugins for #AWS:#Lambda Custom (ephemeral) scripts#ECS/#Fargate Commands For more details on other improvements in this release, see the full Release Notes.

Automating Common Diagnostics for Kubernetes, Linux, and other Common Components

This is the second piece in a series about automated diagnostics, a common use case for the PagerDuty Process Automation portfolio. In the last piece, we talked about the basics around automated diagnostics and how teams can use the solution to reduce escalations to specialists and empower responders to take action faster. In this blog, we’re going to talk about some basic diagnostics examples for components that are most relevant to our users.

3 common pitfalls of post-mortems

Small confession: we currently use the term 'post-mortem' in incident.io despite preferring the term 'incident debrief'. Unless you have particularly serious incidents, the link to death here really isn’t helping anyone. However, we're optimising for familiarity, so we're sticking to the term 'post-mortem' here. Ask any engineer and they’ll tell you that a post-mortem is a positive thing (despite the scary name).

Zero Trust Security: Key Concepts and 7 Critical Best Practices

Zero trust is a security model to help secure IT systems and environments. The core principle of this model is to never trust and always verify. It means never trusting devices by default, even those connected to a managed network or previously verified devices. Modern enterprise environments include networks consisting of numerous interconnected segments, services, and infrastructure, with connections to and from remote cloud environments, mobile devices, and Internet of Things (IoT) devices.

What Is a Secure SDLC?

The Software Development Lifecycle (SDLC) framework defines the entire process required to plan, design, build, release, maintain and update software applications, including the final stages of replacing and decommissioning an application when needed. A Secure SDLC (SSDC) builds on this process, integrating security at all stages of the lifecycle. When migrating to DevSecOps (collaboration between Development, Security, and Operations teams), teams typically implement an SSDLC.

StatusCast Top Picks: 10 More Awesome Customer IT Status Pages

IT services are a critical backbone to the operations and functioning of most every business and organization. As more and more IT departments have embraced the need for good governance, this has driven greater transparency. From the perspective of IT service management, this has manifested itself as much greater openness when communicating about IT service availability.