Operations | Monitoring | ITSM | DevOps | Cloud

RED Metrics & Monitoring: Using Rate, Errors, and Duration

The RED method is a streamlined approach for monitoring microservices and other request-driven applications, focusing on three critical metrics: Rate, Errors, and Duration. Originating from the principles established by Google's "Four Golden Signals," the RED monitoring framework offers a pragmatic and user-centric perspective on service assurance and service performance.

When AWS Goes Down: What It Means For Your Cloud Costs

A global outage at Amazon Web Services (AWS) did more than knock popular apps offline. It laid bare the cost risks embedded in many cloud architectures. As services fail, the hidden costs of high availability, from redundancy planning to recovery operations, often multiply. For cloud cost leaders, this isn’t an issue of uptime; it’s a visibility and budget-shock issue. It’s a key reminder that architecting for resilience involves difficult trade-offs.

Navigating the Database Ecosystem in 2025

In 2025, the database ecosystem is more diverse and interconnected than ever before. From AI-assisted natural language queries that analyze your data to open table formats that make it easy to bridge systems, data infrastructure is moving towards openness, intelligence, and composability. Modern databases are no longer isolated systems; they are part of a broader ecosystem where interoperability is as important as performance.

Bridging partners in pursuit of agentic AI - Part 1: Why partnerships matter for enterprise intelligence

The pace of change in AI development has been dizzying. In just a few years, we’ve moved from experimenting with AI, machine learning (ML), retrieval augmented generation (RAG), and agents to asking how these innovations can solve real business problems. Enterprises are no longer impressed by the novelty and possibilities; instead, they expect outcomes.

Regain Control and Visibility of All IT Assets Across Your Organization

When you don’t have reliable processes for managing IT assets, you can quickly lose control. Asset inventories lose their accuracy, data across tools like CMDBs and spreadsheets stops matching reality, and no one can say with confidence what equipment is in use, where it’s located, how it’s connected, and whether it’s still needed. For data center professionals, a lack of asset visibility creates real risks.

Making logs work smarter: Evolving your observability strategy

When you start building an observability stack, it’s natural to reach for logs first. They’re familiar, easy to generate, and often already part of a developer’s workflow. And sending logs to a centralized system feels like a quick win, too. Simply add a log shipper, and voila, your application is observable.

Why GPUs accelerate AI learning: The power of parallel math

What makes GPUs so crucial for AI workloads? Is it just about raw processing power, or is there more to it? As we explore the world of AI infrastructure, understanding the role of GPUs is essential. Let's dive into the math behind AI. At its core, AI is all about mathematics, and matrix multiplication is a critical component. Whether you're training a model to recognize images or predict outcomes, the data is converted into massive arrays or matrices of numbers.

Secrets We Forgot... Until Automation Saved Us

We All Have That One Secret… That API key that has been sitting in production for ages. The personal access token that was supposed to be rotated 2 months ago. The service key that is about to expire… wait, when does it expire again? Most developers have experienced working with secrets. We create secrets, use them, and promise ourselves that we will rotate them. But somehow, the secret that was supposed to be rotated after 90 days is still standing strong after 6 months. Sounds familiar?

Unpatchable Vulnerabilities: Key Risk Mitigation Strategies

Wouldn’t it be great if every vulnerability had a fix waiting in the wings? If patching were always fast, easy, and complete? That’s not the world we live in. Some vulnerabilities can’t be patched at all. Others are buried in systems or services you don’t fully control. And the longer your focus stays limited to internal infrastructure, the more risk slips through the cracks.