Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

How Stress Affects Our Learning Abilities in Incidents (And What To Do About It)

While retrospectives provide a valuable pathway for learning outside of the flow of work, we also want learning to happen during an incident or unexpected event as it unfolds. This can be challenging due to the negative impact of stress on our ability to learn and navigate difficult situations. In this article, we’ll dig into how stress inhibits our ability to learn and what we can do about it.

Introducing Squadcast's Audit Logs: Enhanced Visibility and Control

Maintaining comprehensive records of user and entity-related changes within your Incident Management platform is crucial. Organizations have long relied on external analytics tools for these insights. However, the demand for an integrated solution within Squadcast has been growing. We are excited to introduce Squadcast's Audit Logs feature, designed to address this need directly within our platform.

Purpose and Goals of Daily Stand-up Meetings

Stand-up meetings are a cornerstone for any engineering team. When done right, they can make a huge difference in keeping everyone on the same page, fostering collaboration, and building a strong team culture. However, getting them right can be a bit tricky. Drawing from our own experience of running engineering stand-ups at Zenduty, and insights from some of the best engineering managers in my network, I'd love to share some tips and insights on how to make your stand-ups effective.

5 Reasons to Switch from PagerDuty to a More Effective Alternative

When it comes to Incident Management, having the right tool can make all the difference between a swift resolution and prolonged downtime. While PagerDuty has long been a staple in the industry, many teams are finding more effective alternatives that better align with their needs and offer significant advantages. Here, we explore five compelling reasons to consider switching from PagerDuty to more efficient alternatives.

The Best SRE Tools To Improve Reliability and Streamline Operations

For better or worse, most companies—including their execs and developers—see SREs as superheroes who’ll save them from the evils of downtime and service degradation with their boundless superpowers. SREs are expected to constantly perform dangerous stunts like production debugging or communicating highly technical issues to angry VPs. They must also be able to manage infrastructure, networks, databases, pipelines, operating systems and much more.

Optimizing Incident Management: Effective Stakeholder Communication with Squadcast

When a critical system goes down, every minute counts. Amid the chaos, it's easy to overlook a crucial aspect of Incident Management: keeping stakeholders informed. However, neglecting stakeholder communication can have disastrous consequences, including misinformation, delayed decisions, and frustration. Effective stakeholder communication is essential for ensuring a coordinated, efficient, and transparent response to incidents.

Rootly On-Call: On-Call Shadowing Feature

Shadowing experienced responders is one of the most effective ways for folks who are new to on-call to gain the confidence and knowledge to handle incidents independently. Traditionally, shadow rotations are cumbersome to set up, involving duplicating and editing an existing schedule. For Rootly On-Call users, setting up shadow rotations couldn’t be easier with our new native Shadowing feature. Here are a few highlights.