Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

Press Start to Scale: SRE in Gaming - Incidentally Reliable with Denys Pashutynski

In our latest episode, we speak with Denys Pashutynski, Senior Engineering Manager of Site Reliability at Roblox, about the formidable challenges of sustaining a global gaming platform. Drawing from his tenure at Twitter, AWS, and eBay, Denys delves into managing traffic surges, latency optimization, and strategic change management. Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty.

Battle-Tested Reliability Strategies - Incidentally Reliable with Abhishek Ghosh

We dive into the trenches with Abhishek Ghosh, a veteran who has led SRE teams at Pinterest, and now at Cribl. He shares gripping war room stories from Pinterest, strategies for maintaining uptime, insights into the role of AI in observability, and more! Discover the future of SRE and learn how to navigate the challenges of digital reliability. Tune in to gain valuable lessons from one of the industry's leading experts.

Tutorial: Integrating Grafana with Zenduty

Zenduty is a distributed, end-to-end major incident management platform for production engineering teams, that helps you minimize downtime, implement scalable incident response processes and institutionalize site reliability within your organization. Grafana is a multi-platform open source analytics and interactive visualization web application. It can produce charts, graphs, and alerts for the web when connected to supported data sources.

Tutorial: Integrating Prometheus with Zenduty

Zenduty is a distributed, end-to-end major incident management platform for production engineering teams, that helps you minimize downtime, implement scalable incident response processes and institutionalize site reliability within your organization. Alertmanager is a powerful component of the Prometheus ecosystem designed to handle alerts. It manages alerts by deduplicating, grouping, and routing them to the appropriate receiver integrations such as email, Slack, or custom webhooks.

The Science of Building Cloud Native DevTools - Incidentally Reliable with Ramiro Berrelleza

Catch Ramiro Berrelleza — Founder and CEO at Okteto talk about how impactful DevTool startups are built, the importance of investing in Developer Experience, and the emerging issues with the Cloud Native ecosystem.

Credit-Worthy Reliability - Incidentally Reliable with Krishnendu Majumdar

Catch Krishnendu Majumdar (CPTO at Yubi) talk about his journey in the dynamic Indian startup ecosystem, strategies to build for scale from Day 1 and insights into building sustained user trust via exceptional product performance in high governance industries like credit and finance.

Reliability for the Books - Incidentally Reliable with Niall Murphy

Catch Niall Murphy (Co-Founder of Stanza Systems) talk about graceful degradation, what startups are getting wrong about reliability and how well-thought user-experiences can communicate credibility to current and potential customers. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.

What are some startups Solomon Hykes is rooting for?

What are some startups Solomon Hykes is rooting for? What's his most controversial opinion? Who are some community members that more people should follow? Discover the answers to these questions, and a lot more in the Incidentally Reliable Podcast with Solomon Hykes, live on all major platforms! Tune in as Solomon shares stories from the early days of Docker, Inc, the rollercoaster journey leading to 20 million active developers worldwide, the heavy crown of a tech leader and his vision to revolutionize CI/CD with Dagger today.