The latest News and Information on Service Reliability Engineering and related technologies.
Site Reliability Engineering (SRE) is a practice for managing the reliability of systems that began at Google in the early 2000s. Ben Treynor Sloss from Google started the first SRE team and coined the name.
We recently released Catchpoint’s SRE Report 2020 that analyzed results from the SRE survey we conducted early this year along with a recent addendum survey. The report offers a detailed look at the current state of SRE and how the shift to an all-remote work environment has impacted SRE teams. In this blog, we take a deeper look at one of the report highlights – ‘Heavy Ops Workload Comes at a Cost’.
Over the years there have been a bunch of great talks on site reliability and incident response. Below are a few we thought stood out(in no specific order) and is defintely worth a peek.
“Welcome to Tomorrowland.” That’s how Moogsoft Chairman and CEO Phil Tee kicked off the launch event of Moogsoft Express, the next-generation AIOps and observability solution built from the ground up for DevOps and SRE teams. The reference to a better future is fitting. With its arrival, Moogsoft Express helps these teams maintain visibility and control over increasingly complex CI/CD pipelines, so they can detect issues earlier, fix them faster and prevent outages.