Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Pragmatic Incident Response: 3 Lessons Learned from Failures

In my past experience as an SRE I’ve learned some valuable lessons about how to respond and learn from incidents. Declare and run retros for the small incidents. It's less stressful, and action items become much more actionable. Decrease the time it takes to analyze an incident. You'll remember more, and will learn more from the incident. Alert on pain felt by people — not computers. The only reason we declare incidents at all is because of the people on the other side of them.

Upcoming trends in DevOps and SRE

DevOps and SRE are domains with rapid growth and frequent innovations. With this blog you can explore the latest trends in DevOps, SRE and stay ahead of the curve. The past decade has seen widespread adoption of DevOps methodologies in software development. Unsurprisingly, as the needs of users change, DevOps techniques have evolved as well. In this blog we will look at the trends that are most likely to have a significant impact in the coming years.

Rootly Announces $3.2 Million in Seed Funding from XYZ Venture Capital, 8VC, & Y Combinator

Rootly is on a mission to create a world where maintaining reliability is frictionless, delightful, and accessible to anyone. Making resolving and learning from incidents every organizations superpower.

SRE Report 2021: The Highlights

Our fourth annual SRE Report launched last week. I had the good fortune to be involved in writing and editing it this year for the first time alongside our very own driving force Leo Vasiliou and the brilliant Eveline Oehrlich at DevOps Institute (check out Eveline’s take on the report’s Key Takeaways here), in addition to a number of folks at VMware Tanzu.