Zenduty

An Accidental Shutdown - War Room Story from Ex-Roblox's SRE

Feb 20, 2025 By Zenduty In Zenduty

Former Roblox Sr. Engineering Manager Denys Pashutynski shares a classic reliability horror story from 20 years ago in Ukraine - when one misplaced command shut down the entire corporate LDAP controller. From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about An Accidental Shutdown - War Room Story from Ex-Roblox's SRE

Ex-Roblox SRE's take on SRE vs. DevOps

Feb 19, 2025 By Zenduty In Zenduty

Former Roblox Sr. Engineering Manager Denys Pashutynski clarifies the fundamental difference between SRE and DevOps roles: SREs handle the customer-facing production edge while DevOps focuses on background automation.#sre From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about Ex-Roblox SRE's take on SRE vs. DevOps

Balancing Technical Debt in Fast-Growing Teams

Feb 18, 2025 By Zenduty In Zenduty

Sometimes messy code is better than perfect code. Hear from Ramiro Berrelleza on why over-cleaning technical debt can paralyze your startup's growth, and when it's okay to move fast and fix later. From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about Balancing Technical Debt in Fast-Growing Teams

Ex-Google SRE's 3-Minute On-Call Response

Feb 17, 2025 By Zenduty In Zenduty

Ever wondered about the most intense on-call requirements? Ex-Google SRE Niall Murphy reveals the Google traffic team's strict 3-minute SLA and $2,500/second stakes in the ads system.#SRE#Observability From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about Ex-Google SRE's 3-Minute On-Call Response

The biggest mistake by Devtool founders

Feb 14, 2025 By Zenduty In Zenduty

Key advice from Ramiro (CEO & Founder Okteto): Don't get attached to your solution - get attached to the problem you're solving! Watch how this mindset helped build a successful Kubernetes developer experience tool.#StartupAdvice#Observability Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about The biggest mistake by Devtool founders

The Hard Truth About the Observability Landscape

Feb 12, 2025 By Zenduty In Zenduty

Why are ex-FAANG engineers building observability companies? When millions depend on reliable software, a simple reboot isn't enough anymore. From The Incidentally Reliable podcast with Piyush Verma discussing modern software reliability.#Observability Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about The Hard Truth About the Observability Landscape

Incident Severity Levels: A Complete Technical Guide

Feb 12, 2025 By Rohan Taneja In Zenduty

Incidents are inevitable but how you react to them can make all the difference. Not all incidents are created equal but the main challenge that many SRE teams face is to find a way to react to the incidents properly. When an incident occurs, the major question you need to answer is "how severe is it?" We use incident severity levels that help determine the severity based on some predefined guidelines.

Read Post

Zenduty

Read more about Incident Severity Levels: A Complete Technical Guide

Think Fast: When SREs saved the customer experience

Feb 11, 2025 By Zenduty In Zenduty

How quick decision-making saved customer experience! Featuring Piyush Verma (CTO Last9). Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about Think Fast: When SREs saved the customer experience

Reliability vs Availability: A complete guide to system performance metrics

Jan 31, 2025 By Rohan Taneja In Zenduty

In an always-digital world where users expect reliable services, businesses must measure two critical metrics: reliability and availability. However, reliability and availability are terms often used interchangeably but understanding the difference is crucial when building systems that users can trust and depend on. Both metrics are vital, but depending on your use case, you might prioritize one over the other. Take the 2017 AWS S3 outage.

Read Post

Zenduty

Read more about Reliability vs Availability: A complete guide to system performance metrics

AI-Powered Incident Management by Zenduty

Jan 28, 2025 By Zenduty In Zenduty

Discover how Zenduty's AI-Powered features can help your teams resolve issues faster, reduce downtime, and improve collaboration. In this personalized demo, you'll see how our AI-powered features simplify complex workflows and reduce manual efforts on repetitive tasks. Book your demo to.

View Video

Zenduty

Read more about AI-Powered Incident Management by Zenduty

Operations | Monitoring | ITSM | DevOps | Cloud

Zenduty

An Accidental Shutdown - War Room Story from Ex-Roblox's SRE

Ex-Roblox SRE's take on SRE vs. DevOps

Balancing Technical Debt in Fast-Growing Teams

Ex-Google SRE's 3-Minute On-Call Response

The biggest mistake by Devtool founders

The Hard Truth About the Observability Landscape

Incident Severity Levels: A Complete Technical Guide

Think Fast: When SREs saved the customer experience

Reliability vs Availability: A complete guide to system performance metrics

AI-Powered Incident Management by Zenduty

Monthly Archive

Follow Us