SRE

The latest News and Information on Service Reliability Engineering and related technologies.

[Webinar] Unlock self-service infrastructure monitoring with the Sensu Integration Catalog

May 5, 2022 By Sensu In Sensu

Introducing the Sensu Integration Catalog — a marketplace-like UX for simplifying new user onboarding, and deploying production-ready monitoring in a matter of minutes. The Sensu Integration Catalog is also an open marketplace that new and existing users can contribute to by sharing Sensu configurations. Backed by industry-leading monitoring as code solution, Sensu provides new users with a point-and-click interface to get started quickly, while facilitating DevOps and SRE automation best practices.

View Video

Sensu

Read more about [Webinar] Unlock self-service infrastructure monitoring with the Sensu Integration Catalog

Are your SLOs realistic? How to analyze your risks like an SRE

May 4, 2022 By Ayelet Sachto In Google Operations

Setting up Service Level Objectives (SLOs) is one of the foundational tasks of Site Reliability Engineering (SRE) practices, giving the SRE team a target against which to evaluate whether or not a service is running reliably enough. The inverse of your SLO is your error budget — how much unreliability you are willing to tolerate.

Read Post

Google Operations

Read more about Are your SLOs realistic? How to analyze your risks like an SRE

How to Achieve Measurable Reliability Results

May 4, 2022 By Emily Arnott In Blameless

Reliability is more important than ever. As users depend on services more and more, and competition in every sector grows, a great digital experience becomes the baseline for expectations, not the ceiling. It’s crucial to invest in making your software reliable enough to keep customers happy. ‍ But what does investing in reliability look like?

Read Post

Blameless

Read more about How to Achieve Measurable Reliability Results

The Reverse Red Herring

May 4, 2022 By Geoff Townsend In Blameless

During an incident, time is fungible. At points it seems to go way too fast, and at times it seems like an eternity for a command to complete. More importantly, however, is how it feels to be in an incident. It’s a heightened state of being, where any and every piece of information could be “the one” that helps crack open what is really going on. Likewise, there is an inherent distrust of incoming information.

Read Post

Blameless

Read more about The Reverse Red Herring

CI/CD Pipeline | What It Is & How It Works

May 3, 2022 By Myra Nizami In Blameless

Wondering about CI/CD pipelines? We explain what the CI/CD pipeline is, the steps involved, and best practices along the way.

Read Post

Blameless

Read more about CI/CD Pipeline | What It Is & How It Works

NewsKit API: The journey of building reliability into our systems at News UK

May 3, 2022 By Reliably In Reliably

Starting small and currently serving billions of requests per month is never an easy path. Stoyan Yanev, Principal Engineer and Krasimir Petrov, Senior Software Engineer at News UK will show how they built their infrastructure and the decisions and compromises that had to be made along the way. The talk will be centered around NewsKits API and the importance of Reliability before opening up a discussion among the group.

View Video

Reliably

DevOps
SRE

Read more about NewsKit API: The journey of building reliability into our systems at News UK

How To Reduce Technical Debt

May 2, 2022 By Aimee Pearcy In Reliably

Technical debt is the implied cost of the additional work that is required when a team chooses a quick, easy solution that is limited, instead of a more well-thought-out, higher-quality solution that would take longer. Essentially, it’s what happens when teams prioritize speed over quality. Examples of technical debt include untested code, unreadable code, dead code, duplicated code, or outdated documentation.

Read Post

Reliably

Read more about How To Reduce Technical Debt

Objectively Speaking: Understanding the Power of Objectives

Apr 29, 2022 By Mick Roper In Reliably

Objectives help monitor different aspects of your services and systems such as latencies, error rates, PRs that are open, the age of a bug, and more. These are examples of things that drift away from what we think is good; which is essentially what an objective is. Objectives help us to define what ‘good’ looks like.

Read Post

Reliably

Read more about Objectively Speaking: Understanding the Power of Objectives

How Do You Measure Technical Debt?

Apr 29, 2022 By Kerem Gocen In Reliably

Technical debt is one of the trade-offs today’s software teams make to speed up development, which helps go-to-market time in return. That is mission-critical for most start-ups. Instead of dwelling on implementation details, or trying to cover edge cases that may affect a small fraction of the end-users in an early development stage, agile teams prioritize early and continuous delivery.

Read Post

Reliably

Read more about How Do You Measure Technical Debt?

Post-Incident Review | Why It's Important & How It's Done

Apr 28, 2022 By Emily Arnott In Blameless

Curious about the post-incident review process? We give a complete explanation of post-incident reviews and why they are important and discuss best practices. What is a post-incident review? A post-incident review is an evaluation of the incident response process. The goal of the process is to have clear actions to improve the incident response process and to also help prevent further incidents.

Read Post