Latest News

The Incident Review: 4 Odd Incidents Caused by Animals

May 21, 2021 By JJ Tang In Rootly

Incidents and outages caused by animals highlight the importance of flexibility and out-of-the-box thinking when it comes to SRE.

Read Post

Rootly

Read more about The Incident Review: 4 Odd Incidents Caused by Animals

Resilience in Action Episode 7: Killing Ops with Tony Hansmann

May 19, 2021 By Blameless Community In Blameless

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.

Read Post

Blameless

Read more about Resilience in Action Episode 7: Killing Ops with Tony Hansmann

SREview Issue #13 May 2021

May 18, 2021 By Blameless Community In Blameless

Is it a coincidence that “May” and “yay” rhyme? Probably not. This month has been pretty exciting for us here at Blameless, and we’d love to share why. We also have some of our favorite Tweets, content, and events happening in the SRE and resilience engineering community this month.

Read Post

Blameless

Read more about SREview Issue #13 May 2021

SRE vs. DevOps [Understanding Differences & Similarities]

May 17, 2021 By Emily Arnott In Blameless

Site Reliability Engineering (SRE) and DevOps share a goal of building a bridge between development and operations. We'll explore and compare both approaches. Wondering to yourself, which is better for your company, SRE or DevOps? Neither SRE or DevOps is “better,” exactly, since they’re similar yet different in a few key ways: SRE, or site reliability engineering, is a methodology developed by Google engineer Ben Treynor Sloss in 2003.

Read Post

Blameless

Read more about SRE vs. DevOps [Understanding Differences & Similarities]

Make your Onboarding Experience Better with a Murder Mystery Game

May 17, 2021 By Blameless Community In Blameless

Onboarding a new tool can be boring. Or stressful. Or both. When onboarding an incident response tool, it can be difficult to make sure that your team is getting the most from the experience. Do you opt for a run-of-the-mill meeting, or try to learn while in an incident? Neither option is ideal. That’s why Petal’s DevOps Engineer Michael Cole found a new way to get his team using Blameless for their incident response process.

Read Post

Blameless

Read more about Make your Onboarding Experience Better with a Murder Mystery Game

SRE Availability Metrics

May 17, 2021 By John Hasinsky In PagerTree

How available is your website, service, or platform? What must you monitor and measure to ensure availability? How do you translate uptime into availability? This chart has numbers that every Site Reliability Engineer (SRE) should know. Below the chart, you will find answers to commonly asked questions about SRE and associated metrics.

Read Post

PagerTree

Read more about SRE Availability Metrics

A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

May 17, 2021 By Helen Beal In Moogsoft

When I asked Charlie for permission to attend this year’s AICon (virtual, natch) I thought it would be a shoo-in; learning’s part of my OKRs after all. But he never makes things easy and his ‘yes’ came with a caveat that’s typical when dealing with him. This time, he claimed he didn’t have the budget for the ticket (a likely story!) and I’d have to find another way to get one.

Read Post

Moogsoft

Read more about A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

Practical Guide to SRE: Using SLOs to Increase Reliability

May 13, 2021 By Quentin Rousseau In Rootly

Service Level Objectives (SLOs) are a key component of any successful Site Reliability Engineering initiative. The question is, what are SLOs; and how do you determine what your SLOs should be? Once you've done that, how should you use them?

Read Post

Rootly

Read more about Practical Guide to SRE: Using SLOs to Increase Reliability

SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

May 11, 2021 By Blameless Community In Blameless

Blameless recently had the privilege of hosting SRE leaders Garima Bajpai, Founder at Community of Practice - DevOps Canada and Jason Fraser, Delivery Lead at VMware Tanzu to discuss the value of crisis during incident response, the best and worst tech transformations they’ve seen, how reliability impacts the flow of value, and more.

Read Post

Blameless

Read more about SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

SRE fundamentals 2021: SLIs vs. SLAs. vs SLOs

May 7, 2021 By Adrian Hilton In Google Operations

A big part of ensuring the availability of your applications is establishing and monitoring service-level metrics—something that our Site Reliability Engineering (SRE) team does every day here at Google Cloud. The end goal of our SRE principles is to improve services and in turn the user experience. The concept of SRE starts with the idea that metrics should be closely tied to business objectives. In addition to business-level SLAs, we also use SLOs and SLIs in SRE planning and practice.

Read Post

Google Operations

Read more about SRE fundamentals 2021: SLIs vs. SLAs. vs SLOs

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

The Incident Review: 4 Odd Incidents Caused by Animals

Resilience in Action Episode 7: Killing Ops with Tony Hansmann

SREview Issue #13 May 2021

SRE vs. DevOps [Understanding Differences & Similarities]

Make your Onboarding Experience Better with a Murder Mystery Game

SRE Availability Metrics

A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

Practical Guide to SRE: Using SLOs to Increase Reliability

SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

SRE fundamentals 2021: SLIs vs. SLAs. vs SLOs

Monthly Archive

Follow Us