Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Site Reliability Engineering (SRE) Survey Now Open for 2022 - Calling All Reliability Practitioners and Leaders

In its fifth year, Catchpoint sponsors The SRE Survey, in partnership with Blameless, to uncover new trends and challenges for teams focused on advancing the reliability of digital products.

Squadcast Product Demo | Incident Management | On-call | SRE | Status Page | SLO Tracker | Runbooks

This video explains why Squadcast is a feature-rich solution for SRE, DevOps, and Engineering teams in general. With the ability to help teams quickly mobilize response teams during critical incidents, easily manage on-call schedules, and track SLOs for better SRE, Squadcast is a multi-purpose platform with numerous capabilities. This short video covers everything the product is capable of.

Setting up Route 53 Health Checks

We live in an age where the internet and digital data drive modern day markets, which results in huge amounts of data being generated and consumed. Hence, it has become very important for online platforms to manage this traffic and serve their customers more efficiently. In this blog we will explore the Amazon Route 53 service and see how it addresses domain name system routing and health check problems.

Calling all Reliability Practitioners: Participate in the SRE Survey 2022

For the past four years, Catchpoint and various partners have been running a yearly SRE Survey. This year, Blameless is excited to partner with Catchpoint for the fifth annual survey. We want to hear from you if you are in a DevOps or SRE role or even if you work on reliability with some other title or role. There are tremendous, valuable learnings when we listen closely to practitioners.

Calling all Reliability Practitioners: Participate in the 2022 SRE Survey

For the past four years, Catchpoint and various partners have been running a yearly SRE Survey. This year, Blameless is excited to partner with Catchpoint for the fifth annual survey. We want to hear from you if you are in a DevOps or SRE role or even if you work on reliability with some other title or role. There are tremendous, valuable learnings when we listen closely to practitioners.

Squadcast + OSNexus QuantaStor Integration: Making Incident Management & Alerting more effective

Storage systems are an integral part of IT infrastructure. Given that modern markets are highly competitive and demanding, businesses strive for 24/7 availability. This in turn sets higher expectations for storage systems to be operational all the time. But just like other IT components, even storage systems are prone to incidents. Hence, it is important to have an efficient communication process, to manage alerts during system failures/disasters.

5 Reliability Insights That Immediately Transform Your SRE

As infrastructure engineers, there’s so much you can learn from studying past incidents. Luckily, Blameless Reliability Insights helps you find patterns that better equip you to deal with incidents to come. If you’ve never used it before and you’re curious what it looks like, you can watch a video demo here! These statistical insights give you the power to learn everything you can when something goes wrong. ‍