Operations | Monitoring | ITSM | DevOps | Cloud


The latest News and Information on Service Reliability Engineering and related technologies.

SRE Leader Panel: Business Agility is what matters, SRE can help you get there

Ready for another SRE Thought Leader Panel? This one is themed, Business Agility is what matters, SRE can help you get there. We’re chatting about topics like the value of crisis during incident response, the best and worst tech transformations we’ve seen, how reliability impacts the flow of value, and more. This panel is hosted by Chris Hendrix, staff software engineer at Blameless and features guests.

What is Site Reliability Engineering [Simple Intro to SRE]

Wondering what SRE is all about? We will explain what it is, how it works, why it was developed, and how it can help your organization. So what is SRE (Site Reliability Engineering)? SRE is a methodology that fuses software and operations teams, with the goal of producing reliable, resilient, and scalable systems. Site Reliability Engineering (SRE) was developed by Google engineer Ben Treynor Sloss in 2003. Google’s goal was to increase the reliability of its sites and services.

Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Keeping digital services reliable is more important than ever. When something goes wrong in production, on-call teams face significant pressure to identify and resolve the incident quickly – in order to keep customers happy. But it can be difficult to get the right signals to the right person in a timely fashion.

Resilience in Action E6: Oversize Coffee Mugs, SLOs, and ML with Todd Underwood

‍Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.

Creating Custom Slack Commands

Site Reliability Engineers are expected to know everything that’s happening, all of the time. That’s a lot of things! To help you sift through the noise, we’ve developed a feature that lets you find accurate data about your organization on-demand. You can do this by sending custom-designed commands to FireHydrant directly from your integrated Slack account.