Latest News

The 2023 SRE Report provides the broadest independent insights into SRE Practices

Nov 8, 2022 By Catchpoint In Catchpoint

Findings from the 5th edition of The SRE Report show that lower TCO, Driving Growth and Retaining Customers are Key Business Drivers for Adopting SRE Practices.

Read Post

Catchpoint

Read more about The 2023 SRE Report provides the broadest independent insights into SRE Practices

Guide to Service Level Indicators and Setting Service Level Objectives

Nov 8, 2022 By Last9 In Last9

A guide to set practical Service Level Objectives (SLOs) & Service Level Indicators (SLIs) for your Site Reliability Engineering practices.

Read Post

Last9

Read more about Guide to Service Level Indicators and Setting Service Level Objectives

Empower the SREs - Conclusions from The SRE Report 2023

Nov 8, 2022 By Steve McGhee In Catchpoint

Let's be honest, nobody loves surveys. Ok, well I sure don't. But surveys satisfy a huge need in our demand for insights into complex human-computer, sociotechnical systems. It turns out that we've been measuring the computer part pretty well, but the humans – not as easy to keep track of. When Google SRE first defined toil as a metric we wanted to reduce, we spent far too long trying to quantify it numerically based on tooling and insights from computer systems.

Read Post

Catchpoint

Read more about Empower the SREs - Conclusions from The SRE Report 2023

Introducing a more complete logs forwarding experience

Nov 7, 2022 By Prineet Kaur Bhurji In Platform.sh

One of the key attributes of DevOps and SRE engineers is their ability to meticulously observe and monitor all of their applications. A task which can be achieved more efficiently by centralizing all generated logs to a central endpoint. By centralizing logging, engineers can, at any time, have an accurate overview of all events which take place across their applications, from just one place. Storing logs in an external system also allows companies to ensure compliance with many certifications.

Read Post

Platform.sh

Read more about Introducing a more complete logs forwarding experience

For incident management, should you build or buy?

Nov 7, 2022 By Aaron Lober In Blameless

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling.

Read Post

Blameless

Read more about For incident management, should you build or buy?

Service Level Management Process Explained (with Examples)

Nov 3, 2022 By Myra Nizami In Blameless

‍ Service Level Management, or SLM, is defined as the process of negotiating Service Level Agreements and ensuring that they are met. ‍ Service Level Management is a fundamental part of SRE and DevOps. It encompasses the expectations and perceptions that both the business and the customer have about the service and its performance. Service level management will include existing and new services as they are added, with the service level agreements (SLAs) being modified accordingly.

Read Post

Blameless

Read more about Service Level Management Process Explained (with Examples)

Why 'owning Services' is critical for effective Incident Response

Oct 31, 2022 By Vardhan NS In Squadcast

There is a famous quote that goes like this…‘For every minute spent organizing, an hour is earned.’ At least in the world of incident response, nothing is more apt than this. Digital infrastructure these days is made up of multiple services, an outage could result from either one impacted service or multiple impacted services. So it's essential to have a catalog of all the services along with the point of contact (service owner) responsible for maintaining it.

Read Post

Squadcast

Read more about Why 'owning Services' is critical for effective Incident Response

On Building a Platform Team

Oct 31, 2022 By Jess Mink In Honeycomb

It may surprise you to hear, but Honeycomb doesn’t currently have a platform team. We have a platform org, and my title is Director of Platform Engineering. We have engineers doing platform work. And, we even have an SRE team and a core services team. But a platform team? Nope. I’ve been thinking about what it might mean to build a platform team up from scratch—a situation some of you may also be in—and it led me to asking crucial questions. What should such a team own?

Read Post

Honeycomb

Read more about On Building a Platform Team

Routing alerts from AWS Elastic Beanstalk via CloudWatch

Oct 27, 2022 By Vishal Padghan In Squadcast

Amazon Web Services (AWS) offers 100+ services, each focusing on a specific area of functionality. However, it can be challenging to pick the right services for the task and also to provision them. AWS Elastic Beanstalk, lets you easily deploy and manage applications without the need to learn about the underlying infrastructure that runs these applications.

Read Post