Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Understanding Cloud Services: IaaS, SaaS, and PaaS

Cloud services have skyrocketed in popularity in the past few years, providing a vast array of resources as well as a cost-effective path for the migration from on-premises servers to the cloud. In fact, cloud services are handling all the computing needs of many businesses. It’s very likely you’re already using cloud services and will continue to use more as time goes on.

PagerDuty Incident Response Demo (Extended)

Enjoy this demo that showcases a day in the life of a team handling an incident with PagerDuty's Automated Incident Response solution. PagerDuty enables teams to orchestrate the right response for every incident. It also helps organizations protect revenue and improve customer experiences by resolving critical incidents faster and preventing future occurrences. Now you can bring major incident best practices to your organization with end-to-end response automation and friction-free postmortems.

Arize integration with PagerDuty

Streamline Model Monitoring with Integrated Alerts Arize is an ML Observability platform aimed to detect, troubleshoot, and eliminate ML problems faster. Use Arize to monitor your production models and send alerts to PagerDuty when your models deviate from a certain threshold. Arize and PagerDuty help keep your teams in the loop, send more comprehensive metadata through alerts, and debug your models faster than ever before.

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

With Squadcast, you can define and monitor Service Level Objects for your services. SLOs allow you to define and enforce an agreement between two parties regarding the delivery of a given service. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

Interrupts in software teams: using unplanned work to your advantage

Interrupts are often seen as a problem that eats away at your team’s productivity, and gets in the way of shipping important things for your customers. It’s often consciously accrued from the tech debt we accept to ship features sooner. However when a team doesn’t have a good strategy for dealing with the consequences of those decisions, the pain is felt much more acutely and much sooner.