Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

SRE fundamentals 2021: SLIs vs. SLAs. vs SLOs

A big part of ensuring the availability of your applications is establishing and monitoring service-level metrics—something that our Site Reliability Engineering (SRE) team does every day here at Google Cloud. The end goal of our SRE principles is to improve services and in turn the user experience. The concept of SRE starts with the idea that metrics should be closely tied to business objectives. In addition to business-level SLAs, we also use SLOs and SLIs in SRE planning and practice.

7 Ways SRE Is Changing IT Ops And How To Prepare For Those Changes

SRE best practices are disrupting and catalyzing change in the ways organizations approach IT Operations. In this blog we look at 7 ways SRE is bringing this transition. ‍Site Reliability Engineering is a new practice that has been growing in popularity among many businesses. Also known as SRE, the new activity puts a premium on monitoring, tracking bugs, and creating systems and automations that solve the problem in the long term.

Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Keeping digital services reliable is more important than ever. When something goes wrong in production, on-call teams face significant pressure to identify and resolve the incident quickly – in order to keep customers happy. But it can be difficult to get the right signals to the right person in a timely fashion.

Creating Custom Slack Commands

Site Reliability Engineers are expected to know everything that’s happening, all of the time. That’s a lot of things! To help you sift through the noise, we’ve developed a feature that lets you find accurate data about your organization on-demand. You can do this by sending custom-designed commands to FireHydrant directly from your integrated Slack account.