%term

The latest News and Information on Service Reliability Engineering and related technologies.

An Introduction to Incident Response Roles

Oct 22, 2021 By JJ Tang In Rootly

Learn about the key roles within an incident response team, as well as optional incident roles you may not have thought about.

Read Post

Rootly

Read more about An Introduction to Incident Response Roles

SRE vs. DevOps: What Are the Differences and How Can They Work Together?

Oct 20, 2021 By LogicMonitor In LogicMonitor

The growing importance of technology in business success has forced practically all companies to hire competent, experienced IT professionals. As technology ecosystems become increasingly complex, organizations need a broader range of professionals to focus on tasks like product development, troubleshooting, and customer services. SRE and DevOps have emerged as two of the most critical approaches to success.

Read Post

LogicMonitor

Read more about SRE vs. DevOps: What Are the Differences and How Can They Work Together?

Top 13 Site Reliability Engineer (SRE) Tools

Oct 20, 2021 By Jacob Hall In Dotcom-Monitor

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities. A typical SRE is busy automating, cleaning up code, upgrading servers, and continually monitoring dashboards for performance, etc., so they are going to see more tools in that toolbelt.

Read Post

Dotcom-Monitor

Read more about Top 13 Site Reliability Engineer (SRE) Tools

What Managed Kubernetes Service is Best for SREs?

Oct 15, 2021 By Quentin Rousseau In Rootly

A comparison of EKS, AKS, GKE, Rancher and OpenShift from an SRE’s perspective.

Read Post

Rootly

Read more about What Managed Kubernetes Service is Best for SREs?

Site Reliability Engineering: Top SRE Tools As Voted On By SREs

Oct 11, 2021 By Leo Vasiliou In Catchpoint

Catchpoint is proud to present the top SRE tools as voted on by SREs. In our fourth annual SRE Survey, compiled in partnership with VMware Tanzu Observability and DevOps Institute, we simply asked, “What are a few tools that every SRE should have available in their toolbelt?” Today, we are excited to share the findings with you. While some of the answers were not strictly tools, the analysis gives us valuable insight into the mindset of an SRE.

Read Post

Catchpoint

Read more about Site Reliability Engineering: Top SRE Tools As Voted On By SREs

What SREs Can Learn from Facebook's Largest Outage

Oct 8, 2021 By JJ Tang In Rootly

Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.

Read Post

Rootly

Read more about What SREs Can Learn from Facebook's Largest Outage

4 xMatters Use Cases That May Surprise You

Oct 6, 2021 By Megan Lo In xMatters

xMatters is part technology, part service reliability, and a little bit of magic. If you’ve spent time on the xMatters website, you’ll likely have seen a number of valuable use cases for the platform—it can alert SREs when there’s a website outage, it can accelerate product development for DevOps teams, it can manage on-call schedules and alerts for support teams.

Read Post

xMatters

Read more about 4 xMatters Use Cases That May Surprise You

5 AIOps Use-Cases: How AIOps Helps IT Teams

Oct 6, 2021 By Phil Tee In Moogsoft

In a world with everything digital, you need AIOps to help ensure uptime and break through the noise. Still not sold? Let's explore 5 ways SRE and DevOps teams are using AIOps to boost existing monitoring tools.

Read Post

Moogsoft

Read more about 5 AIOps Use-Cases: How AIOps Helps IT Teams

What is a Site Reliability Engineer (SRE)?

Oct 6, 2021 By Jacob Hall In Dotcom-Monitor

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.