Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Post-Incident Review | Why It's Important & How It's Done

Curious about the post-incident review process? We give a complete explanation of post-incident reviews and why they are important and discuss best practices. What is a post-incident review? A post-incident review is an evaluation of the incident response process. The goal of the process is to have clear actions to improve the incident response process and to also help prevent further incidents.

Jira Integrations with Blameless platform

In this video, our Solutions Engineer walks you through the steps of creating Jira tickets and follow up actions in Blameless. You'll learn how to leverage our Slack integration for quick ticket creation and also how to create tickets from within the Blameless platform. You'll also see how closing a ticket in Jira will automatically close a ticket in Blameless. Additionally, you'll discover how to manage open tickets and incidents in an organized Blameless dashboard.

SRE vs DevOps: What's The Difference?

Whether you’ve heard of or fully jumped on the DevOps or SRE bandwagon, you may have also wondered how the two relate. What’s the difference? Are they really just different ways of looking at the same problem? The term DevOps hit the market first, but SRE wasn’t too far behind. And though they have different origin stories, they both focus on autonomy, automation, and iteration. So why do these paradigms exist? And why do we need both? Let’s look at this further.

SRE: From Theory to Practice | What's difficult about on-call?

We launched the first episode of a webinar series to tackle one of the major challenges facing organizations: on-call. SRE: From Theory to Practice - What’s difficult about on-call sees Blameless engineers Kurt Andersen and Matt Davis joined by Yvonne Lam, staff software engineer at Kong, and Charles Cary, CEO of Shoreline, for a fireside chat about everything on-call. As software becomes more ubiquitous and necessary in our lives, our standards for reliability grow alongside it.

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

This month I hit my 2-year anniversary with Blameless and as our industry progresses and matures, I thought it would be a good opportunity to look back and review how far we have come and also ruminate on where we’re headed. Our shared vision at Blameless is to help engineering teams adopt reliability practices with ease and advance to a resilient culture.
Featured Post

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

In the latest of an occasional series, today we hear from Kurt Andersen, SRE Architect at Blameless, discussing the evolution of incident management, current trends in site reliability affecting engineering teams, as well as an update on how Blameless is addressing the needs of SRE and DevOps.

Managing Burnout | Tips To Minimize The Impact

Burnout is real. Today, the source of burnout can be anything from pandemic fatigue, to the onslaught of political divisiveness, or simply the pace of life worldwide. Whatever the culprit, we’re living in a stressful time. People working in cloud native environments definitely feel burnt out. Silicon Valley investor Marc Andreessen famously said, “Software is eating the world,” and that seems to be quite true. High demand is fueling churn. System and cloud operators feel pressure.