Operations | Monitoring | ITSM | DevOps | Cloud

6 Ways Ops Teams Can Align AI With Business Impact

AI adoption is at an all-time high, withover 70 percent of organizations are using AI in at least one core function. Despite the high rate of AI adoption, many operational teams continue to have difficulty answering the question 'Is AI actually benefiting our business?' The challenge lies in the gap between AI systems and actual business results. Bridging the gap requires aligning operational AI with revenues, customers, and growth metrics. Here are actionable steps to transform AI from a technical tool into a measurable business contributor.

How to Source the Right Solutions for Your Business

When it comes to being in business, you always need to make sure that you're making the right decisions that facilitate growth and success. A huge part of this will mean creating a strong supply chain and finding the right vendors and solutions to put in place. Let's take a look at how you can approach this.

Debugging multi-agent AI: When the failure is in the space between agents

I've been building a multi-agent research system. The idea is simple: give it a controversial technical topic like "Should we rewrite our Python backend in Rust?", and three agents work on it. An Advocate argues for it, a Skeptic argues against, and a Synthesizer reads both briefs blind and produces a balanced analysis. Each agent has its own model, its own tools, its own system prompt. It worked great in testing. Then I noticed the Synthesizer kept producing analyses that leaned heavily toward one side.

The End of Manual Instrumentation: Scaling Observability with OTel OBI & Coralogix

Traditionally, achieving deep visibility into distributed systems required significant trade-offs in engineering time. Collecting meaningful application metrics and traces required teams to embed language-specific agents, modify source code, or manage complex library dependencies across every service.

VictoriaMetrics at KubeCon: Optimizing Tail Sampling in OpenTelemetry with Retroactive Sampling

Last month, the VictoriaMetrics team gave a talk on retroactive sampling at KubeCon Europe 2026. By writing this blog post, as a transcript of the session, we want to explain how retroactive sampling reduces outbound traffic, CPU, and memory usage in the data collection pipeline significantly compared to tail sampling in OpenTelemetry.

Smarter Alert Management: Test on Historical Data, Review Transitions, and Preview Silencing Schedules

Alert fatigue usually isn’t caused by one thing. It’s the accumulation of thresholds that are slightly too sensitive, alerts that fire during known maintenance windows, and historical patterns that nobody has the tools to review easily. Fixing it requires better visibility into how alerts actually behave over time, and a way to test changes before they hit production. We’ve shipped three improvements to alerting in Netdata that address different parts of this problem.

Faster code doesn't mean faster delivery

Software development has never moved this fast. JetBrains' 2026 AI Pulse Survey found that 90% of developers now use at least one AI tool at work. CircleCI's 2026 State of Software Delivery report, covering 28 million workflows across 22,000 organizations, found that daily CI workflow runs jumped 59% year over year, the largest single increase they've ever recorded. In that same period, CI success rates dropped to a five-year low.

Deployed Is Not the Same as Ready: How Mature Is Your Kubernetes Environment?

Kubernetes adoption is no longer the challenge it once was. More than 82% of enterprises run containers in production, most of them on multiple Kubernetes clusters. Adoption, however, does not mean operational maturity. These are two very different things. It is one thing to deploy workloads to a cluster or two and quite another to do it securely, efficiently and at scale. This distinction matters because the gap between adoption and Kubernetes operational maturity is where risk accumulates.

PagerDuty Invests in the AI-First Operations and Resilience of Healthcare and Crisis Response Organizations

At PagerDuty, we believe operational excellence and social impact are inseparable. As AI rapidly transforms how nonprofits operate, our AI and agentic technology empower mission-driven teams to automate complexity and focus their limited resources on what matters most: delivering reliable services that create meaningful impact at scale.