Operations | Monitoring | ITSM | DevOps | Cloud

Blameless

This Is the Most Underappreciated Skill for SREs

Delivering great software and sustainable systems is a team sport. Without the support of all stakeholders, adoption initiatives often fail. In successful initiatives, SREs are responsible for bringing together all resources and team members to help resolve reliability-related issues. But getting together these resources takes much more effort than people think. SREs engage in lots of glue work to ensure these collaborative efforts happen.

Little Known Ways to Better Use Your Error Budgets

One of the most versatile and foundational SRE tools is the SLO, or service level objective. The SLO is a threshold set for key reliability metrics. When incidents push the metric over the threshold, a response launches to prevent further damage. Conversely, as long as you meet your SLO, you can continue to ship new code. The space you have before you breach this threshold is the error budget.

Modern Operations Best Practices from Engineering Leaders at New Relic and Tenable

As reliability shifts left, more companies are adopting SRE best practices. These best practices don’t only include conducting incident retrospectives. The heart and soul of these best practices are a blameless culture and a desire to grow from each incident. In a recent industry leaders’ roundtable hosted by Blameless, top experts discussed how teams can embrace SRE best practices and make cultural shifts towards blamelessness.

How to Cut Cloud Costs for 2021 Using Blameless

Blameless Incident Management is a tool for managing production incidents. However, it can support many different use cases due to its flexibility. Here at Blameless, we try to “dogfood” our product as much as possible. So we’ve taken to using the IM feature for many other aspects of our daily work, not just system outages. One use case that I’m particularly fond of is using the tool to drive alignment and collaboration around long-term infrastructure projects.

SREview Issue #8 December 2020

🎼 Frosty the SRE/ Was a jolly happy soul/ With his runbooks tight and automated/ and SLOs made out of gollldddddd! 🎼 It’s the most wonderful time of the year, and to celebrate, here’s your December issue of the SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Introducing Blameless Runbook Documentation

At Blameless, our mission is to provide teams with the tools they need to operationalize SRE and embrace a culture of resilience. We help teams automate toil and adopt best practices across integrated incident management, comprehensive retrospectives, service level objectives, reliability insights, and more. We are very excited to announce that teams now have a new tool in their tool belts with our latest launch. Blameless Runbook Documentation is now available for early access.

Here are the Top Predictions for SRE in 2021

Who else is glad that 2020 is almost over? We’ve had one of the most difficult years in recent history. With everything going on, it’s been difficult to think further than a few days out, much less into the new year. But, we’re hopeful that 2021 will be a better year for everyone. And we’re predicting some exciting things in the future for SRE.