SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Stop using debuggers, learn a mental model of a codebase: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

View Video

Last9

Read more about Stop using debuggers, learn a mental model of a codebase: Bill Kennedy - The Reliability Podcast

In engineering, DON'T BUILD FAST: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

View Video

Last9

Read more about In engineering, DON'T BUILD FAST: Bill Kennedy - The Reliability Podcast

Working Effectively With Executives During an Incident

Oct 2, 2023 By Ashley Sawatsky In Rootly

You’re in the incident channel rocking yet another incident. Comms are flowing, resolution is in sight, the team is grinding, and you’re feeling good. Then…

Read Post

Rootly

Read more about Working Effectively With Executives During an Incident

Observability Pillars: Exploring Logs, Metrics and Traces

Sep 29, 2023 By Chitra Bisht In Squadcast

The ability to measure the internal states of a system by examining its outputs is called Observability. A system becomes 'observable' when it is possible to estimate the current state using only information from outputs, namely sensor data. You can use the data from Observability to identify and troubleshoot problems, optimize performance, and improve security. In the next few sections, we'll take a closer look at the three pillars of Observability: Metrics, Logs, and Traces.

Read Post

Squadcast

Read more about Observability Pillars: Exploring Logs, Metrics and Traces

Blameless Demo 2023

Sep 29, 2023 By Blameless In Blameless

View Video

Blameless

Read more about Blameless Demo 2023

Blameless Announces New Google Docs and Google Drive Integration to Help Engineering Teams Enhance Their Incident Management and Retrospectives

Sep 28, 2023 By Blameless In Blameless

Leading Incident Management Solution Enables Enterprises & Their Engineering Organizations To More Efficiently Produce, Collaborate And Share Retrospectives Through Automation.

Read Post

Blameless

Read more about Blameless Announces New Google Docs and Google Drive Integration to Help Engineering Teams Enhance Their Incident Management and Retrospectives

Unveiling Past Incidents: Accelerating Incident Resolution with Historical Context

Sep 28, 2023 By Vishal Padghan In Squadcast

Having the context of how similar issues were handled in the past can be invaluable. It can help incident responders grasp the nature of recurring problems, their causes, and effective solutions that have worked in the past. Introducing Squadcast’s Past Incidents feature that assists incident responders by presenting them with a list of similar past incidents related to the same service they are currently investigating.

Read Post

Squadcast

Read more about Unveiling Past Incidents: Accelerating Incident Resolution with Historical Context

Product Spotlight: Enhancing Incident Resolution with Blameless' Microsoft Teams Integration

Sep 28, 2023 By Aaron Lober In Blameless

In today's fast-paced digital landscape, swiftly responding to incidents is paramount for engineering teams. Downtime is not just costly; it can tarnish your organization's reputation. The pressure felt by engineering operations, DevOps, and SRE leaders to architect and run an effective incident response process is immense. Fortunately, over the last several years, effective engineering organizations have developed a standard toolkit for running a good incident response process.

Read Post

Blameless

Read more about Product Spotlight: Enhancing Incident Resolution with Blameless' Microsoft Teams Integration

Status Pages 101: Everything You Need to Know About Status Pages

Sep 26, 2023 By Sanjog Sandhu In Squadcast

Status Pages are critical for effective Incident Management. Just as an ill-structured On-Call Schedule can wreak havoc, ineffective Status Pages can leave customers and stakeholders, adrift, underscoring the need for a meticulous approach. Here are two, Matsuri Japon, a Non-Profit Organization and Sport1, a premier live-stream sports content platform, both integrate Squadcast Status Pages to enhance their incident response strategies discreetly. You may read about them later. Crafting these Status Pages demands precision, offering dynamic updates and collaboration.

Read Post

Squadcast

Read more about Status Pages 101: Everything You Need to Know About Status Pages

The Ultimate Guide to DORA Metrics for DevOps

Sep 25, 2023 By Anjali Udasi In Zenduty

In the world of software delivery, organizations are under constant pressure to improve their performance and deliver high-quality software to their customers. One effective way to measure and optimize software delivery performance is to use the DORA (DevOps Research and Assessment) metrics. DORA metrics, developed by a renowned research team at DORA, provide valuable insights into the effectiveness of an organization's software delivery processes.

Read Post