Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Software is ubiquitous and can change our mood: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

My job is an engineer = build ACs: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The job of a backend dev: Build good ACs - Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The WeWork-ization of software: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Observability Pillars: Exploring Logs, Metrics and Traces

The ability to measure the internal states of a system by examining its outputs is called Observability. A system becomes 'observable' when it is possible to estimate the current state using only information from outputs, namely sensor data. You can use the data from Observability to identify and troubleshoot problems, optimize performance, and improve security. In the next few sections, we'll take a closer look at the three pillars of Observability: Metrics, Logs, and Traces.

Unveiling Past Incidents: Accelerating Incident Resolution with Historical Context

Having the context of how similar issues were handled in the past can be invaluable. It can help incident responders grasp the nature of recurring problems, their causes, and effective solutions that have worked in the past. Introducing Squadcast’s Past Incidents feature that assists incident responders by presenting them with a list of similar past incidents related to the same service they are currently investigating.
Sponsored Post

Status Pages 101: Everything You Need to Know About Status Pages

Status Pages are critical for effective Incident Management. Just as an ill-structured On-Call Schedule can wreak havoc, ineffective Status Pages can leave customers and stakeholders, adrift, underscoring the need for a meticulous approach. Here are two, Matsuri Japon, a Non-Profit Organization and Sport1, a premier live-stream sports content platform, both integrate Squadcast Status Pages to enhance their incident response strategies discreetly. You may read about them later. Crafting these Status Pages demands precision, offering dynamic updates and collaboration.