Latest News

Can Security Teams Benefit from SRE? You bet!

Oct 13, 2020 By Emily Arnott In Blameless

When we talk about the reliability of services, SRE encourages us to take a holistic view. Unreliability in service delivery can be due to anything, from hardware malfunctions to errors in code. One source of unreliability that is often overlooked is security. A security breach can damage customer trust far beyond the impact of the breach itself. Even smaller infractions, like failing a service audit, can make users wary.

Read Post

Blameless

Read more about Can Security Teams Benefit from SRE? You bet!

Site reliability engineering-what is SRE?

Oct 11, 2020 By Amrit Balraj In Zenduty

As companies today are racing to build site reliability engineering(SRE) practices within their engineering teams, site reliability engineering has become one of the hottest and highest paying jobs in tech. Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional.

Read Post

Zenduty

Read more about Site reliability engineering-what is SRE?

DevOps/SRE Model: Bursting the Developer's Bubble. Here's the CTO Perspective.

Oct 7, 2020 By Yoram Pollack In BigPanda

Many organizations are transitioning toward a DevOps operational model, where software developers are responsible for operating the applications they develop, instead of a centralized IT operations group. In this “CTO Perspective” interview we talk to BigPanda’s CTO Elik Eizenberg about the challenges in that transition, and what it takes to make it easier. Lean back and watch the interview, or if you prefer reading, take a few minutes to read the transcript.

Read Post

BigPanda

Read more about DevOps/SRE Model: Bursting the Developer's Bubble. Here's the CTO Perspective.

This is your Guide for Implementing SRE in NOCs

Oct 1, 2020 By Emily Arnott In Blameless

Network Operation Centers, or NOCs, serve as hubs for monitoring and incident response. A NOC is usually a physical location in an organization. NOC operators sit at a central desk with screens showing current service data. But, the functionality of a NOC can be distributed. Some organizations build virtual NOCs. These can be staffed fully remotely. This allows for distributed teams and follow-the-sun rotations. NOC as a service is another structure gaining in popularity.

Read Post

Blameless

Read more about This is your Guide for Implementing SRE in NOCs

SREview Issue #5 September 2020

Sep 15, 2020 By Blameless Community In Blameless

Here’s the September issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #5 September 2020

SRE Leaders Panel: Testing in Production

Sep 11, 2020 By Blameless Community In Blameless

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed testing in production, how feature flagging and testing can help us do that, and how to get managers to be on board with testing in production. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Read Post

Blameless

Read more about SRE Leaders Panel: Testing in Production

SRE + Honeycomb: Observability for Service Reliability

Sep 8, 2020 By Jenni Boyer In Honeycomb

As a Customer Advocate, I talk to a lot of prospective Honeycomb users who want to understand how observability fits into their existing Site Reliability Engineering (SRE) practice. While I have enough of a familiarity with the discipline to get myself into trouble, I wanted to learn more about what SREs do in their day-to-day work so that I’d be better able to help them determine if Honeycomb is a good fit for their needs.

Read Post

Honeycomb

Read more about SRE + Honeycomb: Observability for Service Reliability

How to Build Your SRE Team

Sep 1, 2020 By Emily Arnott In Blameless

As you implement SRE practices and culture at your organization, you’ll realize everyone has a part to play. From engineers setting SLOs, to management upholding the virtue of blamelessness, to marketing teams conducting retrospectives on email campaigns, there’s no part of an organization that doesn’t benefit from the SRE mentality.

Read Post

Blameless

Read more about How to Build Your SRE Team

SREview Issue #4 August 2020

Aug 21, 2020 By Blameless Community In Blameless

Here’s the August issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #4 August 2020

What is a Kubernetes Operator and Why it Matters for SRE

Aug 20, 2020 By Emily Arnott In Blameless

Kubernetes is an open-source project that “containerizes” workloads and services and manages deployment and configurations. Released by Google in 2015, Kubernetes is now maintained by the Cloud Native Computing Foundation. Since its release, it has become a worldwide phenomenon. The majority of cloud native companies use it, SaaS vendors offer commercial prebuilt versions, and there’s even an annual convention!

Read Post

Blameless

Read more about What is a Kubernetes Operator and Why it Matters for SRE

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Can Security Teams Benefit from SRE? You bet!

Site reliability engineering-what is SRE?

DevOps/SRE Model: Bursting the Developer's Bubble. Here's the CTO Perspective.

This is your Guide for Implementing SRE in NOCs

SREview Issue #5 September 2020

SRE Leaders Panel: Testing in Production

SRE + Honeycomb: Observability for Service Reliability

How to Build Your SRE Team

SREview Issue #4 August 2020

What is a Kubernetes Operator and Why it Matters for SRE

Monthly Archive

Follow Us