Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

SREview Issue #16 August 2021

We’re kicking off August with some thrilling news: Blameless has closed a $30M Series B fund raise! Learn more about how we’re entering the next phase of our journey to advance reliability for engineering teams here. We’re so grateful to our customers, collaborators, and the entire SRE community for their support! Let’s dive in with our favorite content for the month!

New Product Integration! Microsoft Teams Video

On the heels of our Microsoft Teams integration release to streamline incident management, we’re excited to share that we now support Microsoft Teams Video capabilities. We generate Microsoft Teams video conference links for each Blameless incident for fast and easy collaboration. Microsoft Teams Video joins Zoom, Google Meet, and GoToMeeting in our video integration suite.

Resilience in Action E9: Vulnerability, Compassion, and Post-Incident Reviews in the Emergency Room with Dr. Al'ai Alvarez

‍ What can software engineers learn from post-incident reviews that physicians do in the emergency room? In our ninth episode, Christina, member of the Blameless strategy team, guest-hosts the podcast to interview both Kurt Andersen and Al'ai Alvarez, MD (@alvarezzzy). Dr. Alvarez is an assistant clinical professor of Emergency Medicine at Stanford. Clinically, he’s an emergency physician.

Reliability Matters. Blameless is Growing with Series B $30M Funding

When Blameless started in 2018, the team set out on a mission to help all engineers achieve reliability with less toil and risk. Three years in, that mission has become more important than ever. What has changed is the rate of SRE adoption, now the fastest growing team and practice inside engineering. This represents a clear recognition of the many upsides that an SRE practice brings with its combination of continuous learning, velocity, and resilience.

What's the Difference between Observability and Monitoring?

Wondering what the difference is between observability and monitoring? In this post, we explain how they are related, why they are important, and some suggested tools that can help. The difference between observability and monitoring is that observability is the ability to understand a system’s state from its outputs, often referred to as understanding the “unknown unknowns”.

What is a Blameless Postmortem?

Do blameless retrospectives (or postmortems) help your team? We will explain what they are, if they really work, and how to do them right. A blameless postmortem (or retrospective) is a post-incident document that helps teams figure out why an incident happened, and brainstorm how to improve the process to prevent similar incidents from happening again. In most engineering organizations, everyone agrees that in complex systems, failure is inevitable.

Error Budgets That Work for You. Plus Support for New Relic Metrics and NR Query Language

Error Budgets That Work for You. Plus Support for New Relic Metrics and NR Query Language Did you know that error budget policy is the key to making SLOs actionable? In fact, Twitter’s engineering team did not successfully adopt SLOs until they introduced error budgets. SLOs enable teams to quantify customer happiness, and error budgets enable teams to make data-backed tradeoffs between reliability and feature velocity. We believe that teams optimizing for reliability must adopt both.

Elephant in the Blameless War Room: Accountability

We’ve always advocated that every company can benefit from a blameless culture . Fostering a blameless culture can profoundly boost your organization in powerful ways, from employee retention to developer velocity and innovation. However, there’s an elephant in the room when we talk about blamelessness with executives: accountability. When things go wrong, people still need to get fired, right?

Resilience in Action E8: Vanessa Yiu on Crafting Enterprise Architecture

‍Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.