Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Netdata Overview: All You Need to Know in Under 3 Minutes

In just a few minutes, this walkthrough will show you how to unlock the full power of Netdata during your trial period. From real-time metrics to AI-powered insights, learn how to get immediate value without any guesswork. Whether you're running a Homelab or managing production systems at scale, this video will help you hit the ground running and make every minute of your trial count. Let’s turn your trial into insight, clarity, and control.

9 Best Incident Response Tools (Plus 4 Open-Source Options)

I’ve curated a list of 9 best incident response tools, plus 4 open-source options for you. But first, a quick note: Many people mix up alerting, monitoring, and incident response. Incident response is what you do after receiving an alert. It includes alert acknowledgment, escalations, incident communication, post-incident analysis, and response automation. Yes, some of these (incident communication and post-incident analysis) overlap with incident management.

Kubernetes Is Powerful-But It's Slowing You Down. Here's How to Fix It.

Ask any SRE what slows them down in a Kubernetes incident, and the answer is usually too much information in too many different places. Kubernetes has changed the way we run software. It’s given us incredible flexibility, scalability, and power. But in the years I’ve worked in cloud operations and platform engineering, I’ve also seen how that power comes at a price: complexity.

Factors That Define a Scalable Reseller Hosting Plan

Many entrepreneurs are drawn to reseller hosting as an accessible and profitable business model. As you explore various options, it's important to understand the factors that contribute to a scalable reseller hosting plan. A plan that supports growth must include key elements like performance, flexibility, price, and support. Let's break down these crucial aspects in more detail.

How to monitor and manage front-end observability in Blackfire

In this video, we'll guide you through the process of monitoring and managing your usage of front-end observability features in Blackfire. Learn how to access your Browser usage dashboard to view browser traces collected per environment, track your quota consumption, and understand the concept of spike protection. You'll discover how Blackfire's automatic detection of abnormal traffic spikes protects your monthly quota and ensures continuous data collection.

How to Enable and Configure Front-end Observability in Blackfire

In this video, learn how to enable and configure Front-end Observability in Blackfire. The tutorial covers steps to enable features across multiple environments via the Organization settings / Front-end usage in the Blackfire dashboard. Control front-end observability by enabling or disabling Browser Monitoring and Analytics per environment, using a JavaScript probe and a unique browser key. The video emphasizes the importance of naming transactions and explains how to manually add tracking snippets to HTML for better control.

Zero Ticket Video Series: How to Automate Password Resets with Resolve

Struggling with repetitive IT tickets like password resets and account unlocks? You're not alone — these make up nearly 30% of all service desk requests. In this demo, learn how RITA, the AI-powered IT Agent from Resolve, can eliminate these issues entirely — no ticket required.

The Tech Behind Europe's Space Missions | Canonical x ESA

‎‎Subscribe. Fuel your curiosity. “Open source software is… the glue for everything that everyone does, from sending an email through to managing critical operations, not just space operations.” The European Space Agency (ESA) runs missions ranging from investigating Earth’s forests, to exploring Jupiter’s moons, to deflecting incoming asteroids.

Reliability is not about mythical perfection

See what reliability means to Ganesh Seetharaman, Managing Director at Deloitte, and why it's more than high uptime. Full transcript:  Reliability to me is not about achieving mythical perfection. It's about embracing complexity, recovering quickly from failures or incidents, and building trust through transparency and adaptability.