Operations | Monitoring | ITSM | DevOps | Cloud

Incident correlation: Cross-domain visibility. Smarter triage. Faster L1 teams.

IT incidents are rarely isolated. A network disruption can trigger degradations in infrastructure, which can ripple and cause application errors and end up causing a flood of user complaints. When an L1 operator looks at a single incident, they see only part of the story. Outside their immediate scope, other incidents are actively occurring that are either directly related or impacted by the same underlying cause. Without broader visibility, there is no way to know.

What Are AI Inference Costs? [And How To Manage Them]

If you’re building or running AI-powered features in production, you need a clear understanding of inference costs. Get it right, and you can turn your AI investments into profitable growth. As Larry Advey, Director of Cloud Platform and FinOps at CloudZero and a member of the FinOps Foundation Technical Advisory Council, puts it: “AI investments will only continue to grow.

CloudZero Brings Cloud Cost Intelligence to 13 AI Coding Tools - Cursor, Copilot, and More

Earlier this month, we announced the CloudZero Claude Code Plugin and the CloudZero AI Hub — the first step toward putting your cloud cost data directly inside the AI tools your team already uses. The feedback from customers was clear. They said engineers and FinOps teams wanted more tools and more ways to get answers from CloudZero without switching context. Today, we’re delivering more.

How to Measure MOS Score for VoIP (Step-by-Step)

Poor voice call quality isn't just annoying, it's a productivity killer. Dropped calls mid-negotiation, garbled audio on client meetings, and one-sided conversations where half the words don't make it through: these aren't random technical glitches. They're symptoms of network performance problems that haven't been identified, measured, or fixed. And when your business runs on VoIP, Microsoft Teams, or any cloud-based communication platform, unmeasured voice quality is a liability.

From raw data to flame graphs: A deep dive into how the OpenTelemetry eBPF profiler symbolizes Go

Imagine you're troubleshooting a production issue: your application is slow, the CPU is spiking, and users are complaining. You turn to your profiler for answers—after all, this is exactly what it's built for. The profiler runs, collecting thousands of stack samples. eBPF profilers, including the OpenTelemetry eBPF profiler, operate at the kernel level, so they capture raw program counters: memory addresses pointing into your binary.

When Code Becomes Cheap: The New Reliability Constraint in Software Engineering

For most of the history of software engineering, the primary constraint was production. Code was expensive, skilled engineers were scarce, and shipping features required concentrated human effort. Velocity was limited by how fast people could reason, implement, test, and deploy. That constraint shaped everything from team size, architecture, release cadence, through to how we thought about technical debt. When production is expensive, you optimise for output. You remove friction from shipping.

An Oh Dear skill for use in Claude Code or Codex

AI coding agents are getting good at calling tools. Claude Code, Codex, and others can run shell commands, parse JSON, and reason about the results. But they need to know what tools are available and how to use them. That's what skills are for. A skill is a small package of documentation that teaches an AI agent how to use a specific tool. We've built one for Oh Dear.

CertKit Keystore: Private keys that never leave your infrastructure

When you use CertKit, your private keys live in CertKit’s database, encrypted at rest. We’ve written about why the actual risk is smaller than it sounds. But some organizations have policies that prohibit storing private keys with any third party, regardless of how they’re protected. That policy isn’t going away. The Local Keystore enables those organizations to use CertKit and still keep their keys local.

Monitor Juniper Mist in Datadog

From point-of-sale (POS) terminals to cloud-based applications and mobile devices, reliable connectivity is critical to business operations. Even brief disruptions can negatively impact user experiences, resulting in failed transactions, delayed application responses, or repeated attempts to reconnect. Juniper Mist is an AI-powered networking platform that provides insight into wireless environments, including access point performance and radio frequency health.

A new Host Map for modern infrastructure

A host map is a visual representation of your infrastructure that displays hosts and related resources such as clusters, pods, and containers in a single, interactive view. We introduced the Datadog Host Map more than a decade ago to help you “know thy infrastructure” and answer critical questions: Does everything look healthy? Has anything changed? Does the shape of my environment match what I expect?