Operations | Monitoring | ITSM | DevOps | Cloud

Blameless

A "Retrospective" of Amy Tobey's "The Future of DevOps is Resilience Engineering"

April 22, 2020 at 11:20 AM PST, Amy Tobey began her talk “The Future of DevOps is Resilience Engineering” at Gremlin’s Failover Conf. This talk focused on key concepts from DevOps as a way to understand resilience engineering. Amy began by having the audience participate in a group breathing exercise, taking 3 deep breaths before speaking about the yoga practice of pranayama as a way to understand DevOps.

Reflections on Gremlin's Failover Conf

April 21, 2020 thousands of industry professionals came together virtually to attend a revolutionary conference, Gremlin’s Failover Conf. With dozens of cancelled events, social distancing policies, and heightened stress due to the current crisis, it was more necessary than ever to take a moment to learn, share, and talk to one another about something we are all passionate about. We loved the experience at Failover Conf, and want to share some of our favorite parts with you.

Getting SRE Buy-in from C-Levels for Error Budgets and SLOs, Part 3

You now have postmortems properly implemented, automated, and well-structured. You’re generating reports and data automatically based on all your incidents. Two levels of management have agreed to your SRE buy-in efforts. That is a huge accomplishment! If you’re here, you’re making great traction adopting SRE best practices, but the battle is not won yet. The hardest but most strategic, important effort will be proving to your C-levels why they should buy into SRE.

Thought Leadership Panel: What is a "real" SRE?

Blameless recently had the privilege of hosting SRE leaders Craig Sebenik, David Blank-Edelman, and Kurt Andersen to discuss how can SREs approach work as done vs work as imagined, how to define SRE and DevOps and the complementary nature of the two, the ethics of purchasing packaged versions of open source software, and more. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Getting SRE Buy-in from a VP or Director for Automated Metrics and Continuous Learning, Part 2

After getting managerial approval for incident management, your SRE buy-in program is well underway. How can you prove that it’s effective, and that adopting more best practices is necessary? In part 2 of this blog series, we’re going to share how to convince a VP or director to invest in additional SRE practices to strategically improve business results: automated metrics and continuous learning.

Getting SRE Buy-in from a Manager or Lead for Incident Response, Part 1

Adopting SRE best practices can be difficult, especially when you need approval from managers, VPs, CTOs, and everything in between. In this blog series, we will walk you through how to come up with a winning pitch for each level of leadership to ensure that SRE buy-in will succeed in your organization. Let’s start at the beginning with your team lead or manager.

Resilience in Action, Episode 1: Narratives in Incidents with Lorin Hochstein

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others. In our very first episode, Amy chats with Netflix software engineer Lorin Hochstein.

Technology Innovation Snapshot: How Blameless Accelerates Team Performance

In Digital Enterprise Journal’s March Edition of its Technology Innovation Snapshot, Blameless was listed among 11 other companies as promising vendors. Blameless is honored to be alongside companies such as Gremlin, Catchpoint, and Moogsoft, and excited about the future DEJ sees for the SRE space.

How SRE's can Embrace Resilience During Crises

Blameless recently had the privilege of hosting SRE leaders Liz Fong-Jones, Dave Rensin, and Alex Hidalgo to discuss how SREs can embrace resilience during pandemic, and how the principles of SRE intersect with global trends. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Best Practices for Pragmatic Incident Command

The goal of this piece is to provide some practical advice on how teams can coordinate and respond to complex, dynamic incidents. After all, incidents are unplanned investments that surface valuable learnings for improvement. For the purposes of this blog, we define incidents as situations where there is a need for coordination among multiple people working on the same problem. There will be incidents where this is not the case.