Blameless

SREview Issue #6 October 2020

Oct 16, 2020 By Blameless Community In Blameless

BOO! Did we scare you? We couldn’t help it, we’re just so happy it’s spooky season. Here’s the October issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #6 October 2020

Can Security Teams Benefit from SRE? You bet!

Oct 13, 2020 By Emily Arnott In Blameless

When we talk about the reliability of services, SRE encourages us to take a holistic view. Unreliability in service delivery can be due to anything, from hardware malfunctions to errors in code. One source of unreliability that is often overlooked is security. A security breach can damage customer trust far beyond the impact of the breach itself. Even smaller infractions, like failing a service audit, can make users wary.

Read Post

Blameless

Read more about Can Security Teams Benefit from SRE? You bet!

How to Construct a Reliability Model for your Organization

Oct 8, 2020 By Emily Arnott In Blameless

As you adopt SRE practices, you’ll find that there are optimization opportunities across every part of your development and operations cycle. SRE breaks down silos and helps learning flow through every stage of the software lifecycle. This forms connections between different teams and roles. Understanding all the new connections formed by SRE practices can be daunting. Building a model of SRE specific to your organization is a good way to keep a clear picture in your head.

Read Post

Blameless

Read more about How to Construct a Reliability Model for your Organization

This is your Guide for Implementing SRE in NOCs

Oct 1, 2020 By Emily Arnott In Blameless

Network Operation Centers, or NOCs, serve as hubs for monitoring and incident response. A NOC is usually a physical location in an organization. NOC operators sit at a central desk with screens showing current service data. But, the functionality of a NOC can be distributed. Some organizations build virtual NOCs. These can be staffed fully remotely. This allows for distributed teams and follow-the-sun rotations. NOC as a service is another structure gaining in popularity.

Read Post

Blameless

Read more about This is your Guide for Implementing SRE in NOCs

The Ultimate, Free Incident Retrospective Template

Sep 30, 2020 By Hannah Culver In Blameless

Incident retrospectives (or postmortems, post-incident reports, RCAs, etc.) are the most important part of an incident. This is where you take the gift of that experience and turn it into knowledge. This knowledge then feeds back into the product, improving reliability and ensuring that no incident is a wasted learning opportunity. Every incident is an unplanned investment and teams should strive to make the most of it.

Read Post

Blameless

Read more about The Ultimate, Free Incident Retrospective Template

Here's your Complete Definition of Software Reliability

Sep 24, 2020 By Emily Arnott In Blameless

We live in the era of software convenience, where we take for granted that hundreds of services are always at our fingertips. These applications become part of our daily routines because they are so reliable. However, this consistency makes reliability work invisible to the end user. It can be difficult to appreciate the effort behind maintaining a high availability service. Because of that, people may misunderstand exactly what makes a service reliable.

Read Post

Blameless

Read more about Here's your Complete Definition of Software Reliability

Availability, Maintainability, Reliability: What's the Difference?

Sep 17, 2020 By Emily Arnott In Blameless

We live in an era of reliability where users depend on having consistent access to services. When choosing between competing services, no feature is more important to users than reliability. But what does reliability mean? To answer this question, we’ll break down reliability in terms of other metrics within reliability engineering: availability and maintainability. Distinguishing these terms isn’t a matter of semantics.

Read Post

Blameless

Read more about Availability, Maintainability, Reliability: What's the Difference?

SREview Issue #5 September 2020

Sep 15, 2020 By Blameless Community In Blameless

Here’s the September issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #5 September 2020

SRE Leaders Panel: Testing in Production

Sep 11, 2020 By Blameless Community In Blameless

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed testing in production, how feature flagging and testing can help us do that, and how to get managers to be on board with testing in production. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Read Post

Blameless

Read more about SRE Leaders Panel: Testing in Production

How to Improve the Reliability of a System

Sep 8, 2020 By Emily Arnott In Blameless

Site reliability engineering is a multifaceted movement that combines many practices, mentalities, and cultural values. It looks holistically at how an organization can become more resilient, operating on every level from server hardware to team morale. At each level, SRE is applied to improve the reliability of relevant systems. With such wide-reaching impact, it can be helpful to take time to reevaluate how to improve the reliability of a system.

Read Post

Blameless

Read more about How to Improve the Reliability of a System

Operations | Monitoring | ITSM | DevOps | Cloud

Blameless

SREview Issue #6 October 2020

Can Security Teams Benefit from SRE? You bet!

How to Construct a Reliability Model for your Organization

This is your Guide for Implementing SRE in NOCs

The Ultimate, Free Incident Retrospective Template

Here's your Complete Definition of Software Reliability

Availability, Maintainability, Reliability: What's the Difference?

SREview Issue #5 September 2020

SRE Leaders Panel: Testing in Production

How to Improve the Reliability of a System

Monthly Archive

Follow Us