Operations | Monitoring | ITSM | DevOps | Cloud


The latest News and Information on Service Reliability Engineering and related technologies.

"Things get SREious": SRE from Home Recap

Without SRECon happening this year and the world turned upside down from COVID-19, we set out to hold a virtual event to bring SREs together to share their experiences of what has changed. Last week’s SRE from Home was exactly that. With 1900 registrants, 20 lively Slack channels, six illuminating and entertaining talks from a diverse range of experts in the field and our #askanSRE panel answering attendees’ questions with a candid generosity, it was an amazing, jam-packed day.

Evan Niedojadlo from Peddle shares his thoughts on being an SRE

Evan Niedojadlo is an SRE at Peddle based in Austin, TX. He is currently on a small team and works on the SRE, Ops, and Security area of the organization. In his free time, he enjoys building communities, reading, music, helping others learn, and being outside.

How to Choose Monitoring Tools for DevOps and SRE

When developing for reliability or implementing resilient DevOps practices, the heart of your decision-making is data. Without carefully monitoring key metrics like uptime, network load, and resource usage, you’ll be blind to where to spend development efforts or refine operation practices. Fortunately, a wide variety of monitoring tools are available to help you collect and get visibility into this data.

Promoting Continuous Learning with SRE

With the extreme changes we’ve all been through these last several months, it should come as no surprise that our jobs have changed drastically, too. We’re working remotely. We’re dealing with increased resource constraints. Our services are receiving more traffic than usual, and we’re tasked with keeping things up and running. Our work-as-done may not match what we did at the beginning of 2020.

SRE Report 2020 - Balancing 'Dev' and 'Ops'

We recently released Catchpoint’s SRE Report 2020 that analyzed results from the SRE survey we conducted early this year along with a recent addendum survey. The report offers a detailed look at the current state of SRE and how the shift to an all-remote work environment has impacted SRE teams. In this blog, we take a deeper look at one of the report highlights – ‘Heavy Ops Workload Comes at a Cost’.

SRE Leaders Panel: Managing Systems Complexity

In our previous panel, we spoke about how to overcome imposter syndrome in high tempo situations, and how culture directly affects the availability of our systems. Building on that last discussion, we gathered leading minds in the resilience industry to discuss how SRE can manage systems complexity, and how that's tightly intertwined with business health especially in the context of current health and social crises.