SRE

The latest News and Information on Service Reliability Engineering and related technologies.

8 SRE Best Practices to Help Developers Troubleshoot Kubernetes

Mar 15, 2023 By Lisa Wells In StackState

Maintaining reliable Kubernetes systems is not easy, especially for people who are not Kubernetes experts. This blog, part 2 of 3 in the “8 SRE Best Practices to Help Developers Troubleshoot Kubernetes” series, explains 8 simple best practices SREs can follow to help developers and other SREs build knowledge and effectively troubleshoot issues in applications running on Kubernetes.

Read Post

StackState

Read more about 8 SRE Best Practices to Help Developers Troubleshoot Kubernetes

What is SOC 2 Compliance? | A Guide to SOC 2 Certification

Mar 15, 2023 By Emily Arnott In Blameless

We’re excited to announce that Blameless is officially SOC 2 compliant! This is part of our larger efforts to assure all the users of Blameless and visitors to our site that we’re meeting and exceeding all of your privacy and security needs. Learn more by visiting our security page! When choosing a service, it’s important to have trust in the provider – especially for something as important as your incident management.

Read Post

Blameless

Read more about What is SOC 2 Compliance? | A Guide to SOC 2 Certification

Squadcast + Auvik Integration: Routing alert made easy

Mar 14, 2023 By Vishal Padghan In Squadcast

Auvik is a cloud-based network management software that gives you instant insight into the networks you manage and automates complex and time-consuming network tasks. If you use Auvik for network management, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from Auvik to the right users in Squadcast. This blog is a step-by-step guide that will help you set up Squadcast-Auvik Integration.

Read Post

Squadcast

Read more about Squadcast + Auvik Integration: Routing alert made easy

Protect PII and add geolocation data: Monitoring legacy systems with Grafana

Mar 14, 2023 By Mattias Segerdahl In Grafana

Legacy systems often present a challenge when you try to integrate them with modern monitoring tools, especially when they generate log files that contain personally identifiable information (PII) and IP addresses. Thankfully, Grafana Cloud, which is built to work with modern observability tools and data sources, makes it easy to monitor your legacy environments too.

Read Post

Grafana

Read more about Protect PII and add geolocation data: Monitoring legacy systems with Grafana

Adopting SRE: Standardizing your SLO design process

Mar 11, 2023 By Derek Remund In Google Operations

Designing SLOs is a key SRE competency which requires careful consideration and a consistent approach to implementation.

Read Post

Google Operations

Read more about Adopting SRE: Standardizing your SLO design process

Datadog On Reliability Engineering

Mar 7, 2023 By Datadog In Datadog

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.

View Video

Datadog

Read more about Datadog On Reliability Engineering

What Does IT Maturity Even Mean?

Mar 7, 2023 By Aaron Lober In Blameless

Seriously… What are people trying to say by “Your approach to IT Operations needs to mature”? Fair question. Billions of dollars are spent every year on software solutions to help IT organizations operate more efficiently. How could it be that with all that investment, we’re still not netting enough efficiency gains? The truth is, our technology landscape has evolved, our operational models have evolved, we have evolved.

Read Post

Blameless

Read more about What Does IT Maturity Even Mean?

SLA vs SLO vs SLI - What's the difference

Mar 7, 2023 By Last9 In Last9

What's the difference between SLAs vs SLOs vs SLIs. Understanding these little nuances are critical for DevOps folks. Here's a simple reckoner on what each of these mean.

Read Post

Last9

Read more about SLA vs SLO vs SLI - What's the difference

Reducing Security Incidents: Implementing Docker Image Security Scanner

Feb 28, 2023 By Shishir Khandelwal In Squadcast

Are you utilizing Docker to deploy your applications? If so, you're not alone. The use of Docker has skyrocketed in popularity in recent years. While it offers numerous benefits, it also introduces new security risks that need to be addressed. But, why is reducing security incidents so important? Simple - the cost of a security breach can be devastating. From lost customer trust to financial losses, the consequences of a security incident can be severe. That's why it's crucial to take steps to prevent them from occurring in the first place. Enter Docker image security scanners.

Read Post

Squadcast

Read more about Reducing Security Incidents: Implementing Docker Image Security Scanner

Webinar on 'Evolution of Incident Management from On-Call to SRE' | Squadcast

Feb 26, 2023 By Squadcast In Squadcast

This Incident Management has evolved considerably over the last decade, more so in the last few years. What was traditionally limited to having just an in-house on-call team and an alerting system, has now grown well beyond that to ensure Automation, Collaboration, Transparency, and Retrospection are deeply entrenched in Incident Response.

View Video