Operations | Monitoring | ITSM | DevOps | Cloud

Puppet Control Repository: Your Source of Truth for Infrastructure Management

Learn the fundamentals of Puppet's Control Repository with Margaret and Tony in this comprehensive walkthrough. See how Control Repos serve as your single source of truth for managing configuration across your entire infrastructure, driving collaboration and standardization while simplifying code deployments.

Behind Megaport's Network Automation Platform

We’ve teamed up with the Heavy Networking podcast to take you under the hood of Megaport’s resilient, software-driven network. Luke Gollan, Network Automation Engineer at Megaport, joins Heavy Networking hosts Ethan Banks and Drew Conry-Murray to unpack what happens when you click “provision” in the Megaport portal.

Behind the Dashboard: How to monitor your LLM integrations

Behind the Dashboard is an ongoing series where we look under the hood of a specific Catchpoint feature. Each episode breaks down the technology itself, what’s challenging about using it for monitoring, and how we removed friction and toil to make it a valuable part of the Catchpoint platform. In this episode Leon, Mursi, and Rahul take a look at Catchpoint’s LLM monitoring capabilities, including ensuring your integrated LLMs are up and performing optimally; as well as knowing if you’re using the most effective (accurate) and economical (cheapest per query) option in your suite.

FireHydrant 4-Minute Demo

Get a quick walkthrough of the FireHydrant platform. FireHydrant is the all-in-one incident management platform that helps teams resolve incidents up to 90% faster — and prevent them from happening again. From flexible alerting and powerful automation to retros and AI insights, it brings clarity and control to every step of your response.

Pastries with SREs: Limitless observability and uncompromised donuts

In this episode of Pastries with SREs, we dig into Limitless Observability with a sweet side of unified observability strategy. If you're tired of siloed tools, fractured data, and swivel-chair investigations, this one’s for you. We explore: Why are silos still the norm in modern observability? What’s the true cost of inefficiencies across logs, metrics, and traces? How can SREs, IT operations, and dev teams shift to a no-compromise, unified observability model?

How to make Netflix reliable: Address low-hanging fruit

Reliability doesn’t have to be fancy and dramatic. Kolton and his team dramatically improved Netflix reliability by focusing on low-hanging fruit. FULL TRANSCRIPT: My first holiday peak at Netflix, where my VP of engineering came to me and he said, "Kolton, what do you think the chance we make it through the holiday peak without an outage is?"  I thought about it for a minute and I said, "50/50.".