Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Blameless culture drives incident learning and other key insights from Catchpoint's 2022 SRE Report

SRE is a constantly evolving field, responding to the challenges of increasing reliance on tech and the opportunities of its evolving abilities. Reliability has to remain a step ahead of the cutting edge, whether it’s navigating remote work, implementing AI assistance, or optimizing internal processes. But how do we know that SRE is keeping up? ‍ We’re proud and excited to announce the results of the SRE Survey we ran in partnership with Catchpoint.

For incident management, should you build or buy?

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling.

Service Level Management Process Explained (with Examples)

‍ Service Level Management, or SLM, is defined as the process of negotiating Service Level Agreements and ensuring that they are met. ‍ Service Level Management is a fundamental part of SRE and DevOps. It encompasses the expectations and perceptions that both the business and the customer have about the service and its performance. Service level management will include existing and new services as they are added, with the service level agreements (SLAs) being modified accordingly.

What's difficult about problem detection? - Three Key Takeaways

Welcome to episode 4 of our webinar series, From Theory to Practice. Blameless’s Matt Davis and Kurt Andersen were joined by Joanna Mazgaj, Director of Production Support at Tala, and Laura Nolan, Principal Software Engineer at Stanza Systems. They tackled a tricky and often overlooked aspect of incident management: problem detection. ‍