Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Status Pages That Deliver: Top 10 Favorites

Status Pages represent an invaluable asset for websites and SaaS businesses, particularly in today's environment with prevalent outages and heightened user expectations for seamless uptime. Integral to any robust website monitoring strategy, these pages serve as centralized hubs, offering users a singular, authoritative source for tracking the status of websites and applications.

Status Pages 101: How to Create a Status Page You and Your Customers Will Actually Want to Use

This blog post is adapted from my talk at SRECon EMEA 2023 - original slides are available here! Status pages are a simple yet underutilized element of incident communication. Done well, they’re a low-lift way to keep your customers and stakeholders informed when incidents impact them. But without a solid approach, updating status pages can easily become a tedious and often neglected task during incidents. In this post, we’ll cover some tips to get your status page right.

Introducing Squadcast's Global Event Rulesets | Incident Management | Squadcast

With video will give you a walkthrough of Squadcast's new feature 'Global Event Rulesets' that helps you simplify alert Routing and boost efficiency Global Event Rulesets enable you to manage alert routing across services and automate actions based on predefined global event rulesets.

Secret to Flawless Deployments: Real-Time Canary Deployment tracking with Argo CD & Levitate!

Most of your outages are probably caused by a change, and having observability around that will make a lot of difference. Dive into this walkthrough, where we showcase tracking Canary deployments in Argo CD, correlating events and metrics seamlessly with Levitate. For Site Reliability Engineers, DevOps engineers, Software Engineers, and Product Managers seeking to elevate their observability and ensure smooth deployments every time.

Tips To Never Miss An Incident Notification With Squadcast Escalations Policies

Companies implement an Incident Response process to promptly resolve critical issues. Setting up escalation policies to notify engineers is a key step in this process. With traditional escalation policies, alert notifications still get missed which results in higher response times and failure to meet SLAs. So, how can one ensure incident notifications are never missed?
Sponsored Post

Opsgenie Alternatives: Finding the Right Fit for your Incident Management Teams

In the dynamic landscape of modern IT operations and Incident Management, choosing the right tool is paramount to ensuring the resilience of your organization. Opsgenie, a popular Incident Response and Alerting platform, has been a go-to choice for many. However, as businesses grow and requirements evolve, exploring Opsgenie alternatives becomes essential in the quest to find the perfect fit for your unique operational needs. In this blog, we'll embark on a journey to uncover and evaluate some compelling alternatives to Opsgenie, helping you navigate the vast sea of options and make an informed decision that aligns perfectly with your team's workflows and objectives.