Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Burn rate is a better error rate

While building our Service Level Objectives (SLO) product, our team at Datadog often needs to consider how error budget and burn rate work in practice. Although error budgets and burn rates are discussed in foundational sources such as Google’s Site Reliability Workbook, for many these terms remain ambiguous. Is an error budget a static quantity or a varying percentage? Does burn rate indicate how fast I’m spending a fixed quantity, or is it just another way to express error rate?

Top Features to Look for in Enterprise Incident Management Software

Are you tired of dealing with unexpected system crashes and the chaos they bring? You're not alone. For enterprise SREs, DevOps, and IT Operations teams, mastering incident management goes beyond just fixing problems; it’s about preventing them. According to a recent report, incident volume within enterprise companies rose by 16% during 2023, highlighting the growing complexity and risk in digital operations. This underscores the urgent need for robust incident management solutions.

Introducing Statusy - An Open Source Status Page Aggregator

A quick walkthrough of Statusy—an open-source status page aggregator that centralizes service monitoring for your team. Created by Yash Jain at Squadcast, Statusy simplifies tracking with a unified dashboard and flexible notifications. Set up in minutes and keep your team informed! Statusy is fully open source.

OpenTelemetry Filelog Receiver: Collecting Logs from Kubernetes

Master log collection in Kubernetes with OpenTelemetry's filelog receiver. Learn to configure, optimize, and troubleshoot log collection from various sources including syslog and application logs. Discover advanced parser operator techniques for robust observability.