%term

The latest News and Information on Service Reliability Engineering and related technologies.

Site Reliability Chats (Mar 2, 2022)

Mar 2, 2022 By Gremlin In Gremlin

Welcome to the first episode of Site Reliability Chats with your hosts Jason Yee @gitbisect and Julie Gunderson @julie_gund.

View Video

Gremlin

Read more about Site Reliability Chats (Mar 2, 2022)

Quickly troubleshoot application errors with Error Reporting

Feb 28, 2022 By Eyamba Ita In Google Operations

Are you familiar with the four golden signals of Site Reliability Engineering (SRE): latency, traffic, errors, and saturation? Whether you’re a developer or an operator, you’ve likely been responsible for collecting, storing, or analyzing the data associated with these concepts. Much of this data is captured in application and infrastructure logs, which provide a rich history of what is happening behind the scenes in your workloads.

Read Post

Google Operations

Read more about Quickly troubleshoot application errors with Error Reporting

Traditional vs Modern Incident Response

Feb 24, 2022 By Kristijan Mitevski In Squadcast

An incident is an event (network outage, system failure, data breach, etc.) that can lead to loss of, or disruption to, an organization's operations, services or functions. Incident Response is an organization’s effort to detect, analyze and correct the hazards caused due to an incident. In the most common cases, when an incident response is mentioned, it usually relates to security incidents. Sometimes incident response and incident management are more or less used interchangeably.

Read Post

Squadcast

Read more about Traditional vs Modern Incident Response

Service Level Objectives: Where do we start?

Feb 22, 2022 By Last9 In Last9

Most of us have heard about SLOs and what they mean but always found it hard to start adopting them across our teams. This video is a way to demystify the journey of adoption of SLOs, with examples of how several large companies like Disney adopted them. Whether you are new to the DevOps/SRE world or an experienced developer, you will learn a fresh approach to making software more reliable!

View Video

Last9

Read more about Service Level Objectives: Where do we start?

Everything you need to know about Squadcast and Microsoft Teams Integration

Feb 21, 2022 By Vishal Padghan In Squadcast

Microsoft Teams is one of the most versatile tools in terms of providing collaboration and chat solutions to numerous enterprises. We at Squadcast understand how important Microsoft Teams can be for your organization. Hence, we bring you this blog on Squadcast-Microsoft Teams integration that will tell you how this integration can help in improved incident management, effective collaboration and a lot more.

Read Post

Squadcast

Read more about Everything you need to know about Squadcast and Microsoft Teams Integration

Top 13 Site Reliability Engineer (SRE) Tools

Feb 20, 2022 By Jacob Hall In Dotcom-Monitor

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization, and as such, so do site reliability engineer tools. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities.

Read Post

Dotcom-Monitor

Read more about Top 13 Site Reliability Engineer (SRE) Tools

Why and How SREs Can Benefit from Feature Flags

Feb 17, 2022 By Weihan Li In Rootly

When you think of who uses feature flags, your mind most likely goes to developers. In general, feature flags are closely associated with software engineering. But Site Reliability Engineers, too, can benefit from feature flags. SREs may not be the ones to create feature flags, but they should work closely with developers to ensure that the applications their teams support include feature flags.

Read Post

Rootly

Read more about Why and How SREs Can Benefit from Feature Flags

Cloud Complexity - Bringing Resources together in Multi-cloud Environments

Feb 15, 2022 By Caleb Munyasya In Squadcast

The world is still getting used to operating within the cloud. Moving to the cloud is challenging for many organizations. So why do we see a rise in the adoption of multicloud strategies? In this blog, we will explore why this trend is worth considering for your organization, as well as look at the challenges that it brings.

Read Post

Squadcast

Read more about Cloud Complexity - Bringing Resources together in Multi-cloud Environments

How We Define SRE Work

Feb 15, 2022 By Fred Hebert In Honeycomb

At the time of writing this post, I have officially been at Honeycomb for one year as a site reliability engineer (SRE). I had shared my initial experiences and impressions in this post and thought it would make sense to check back in now that I’ve had the opportunity to spend time learning about the team, the culture, and the code base more in depth.

Read Post