Operations | Monitoring | ITSM | DevOps | Cloud

7 ways AI agents are transforming software delivery

For most teams, the slowest part of delivery isn’t writing code, it’s everything that happens after: automated tests, manual reviews, bug fixes, final approvals, and the long wait for deployment. The longer these phases run, the more expensive and painful late fixes become. As AI makes it easier to generate code at scale, those bottlenecks only get bigger.

When BGP becomes UX: The inside story of a SaaS routing decision gone wrong (or right)

Most operations teams trust their green dashboards. If the internal monitoring says everything is healthy, the app must be fine, right? But as the Internet keeps proving, what’s green inside the firewall can look red for customers outside of it. Sometimes, a single change in how web traffic moves can suddenly slow logins, disrupt websites, or hurt business results, even if everything looks fine inside.

What the 2025 DORA Report Teaches Us About Observability and Platform Quality

The 2025 DORA State of AI-Assisted Software Development Report delivers a critical insight for technology leaders: AI is fundamentally an amplifier, not a solution. It magnifies the strengths of high-performing organizations with robust observability while exposing the dysfunctions of struggling ones. For organizations that have rushed to adopt AI coding assistants all while expecting immediate productivity gains, this finding demands a strategic pivot.

Inside the InfluxDB 3 Plugin Ecosystem

Companies today face growing pressure to manage and analyze massive flows of time series data, from IoT sensors to cloud-native infrastructure. Storing this information is relatively straightforward. The greater obstacle is keeping it useful and consistent while balancing a wide range of tools and modern technology platforms that continue to evolve.

A closer look at Grafana k6 browser: alignment with Playwright, modern features for frontend testing, and what's next

Over the years, we’ve seen our community embrace Grafana k6 browser as a key component of their frontend testing strategies. By helping collect frontend web vitals, capture custom metrics, and simulate user actions like clicking buttons or completing forms, the module offers teams a deeper understanding of performance and availability from their end users’ point of view.

Agentic AIOps in Action: LogicMonitor, IBM, and Red Hat Deliver Self-Healing IT

Your most skilled engineers shouldn’t be spending nights and weekends piecing together root causes of outages. Yet many organizations still rely on manual incident response across sprawling hybrid and multi-cloud environments. The result: slower resolution times, frustrated customers and lost revenue that can reach up to $1 million per hour according to IDC. At LogicMonitor, we believe the answer isn’t just better monitoring. It is systems that can heal themselves.

3 things you can do to get closer to five nines

5 minutes. That’s how much downtime some of the world’s largest enterprises will tolerate. For most organizations, five nines (99.999%) of availability sounds like a pipedream. But the trick to increasing availability isn’t massive infrastructure spending or complex system redesigns. All it takes are three key practices that any team can adopt and implement. In this post, we’ll present these practices and how we implement them at Gremlin.

Sending beers all across Belgium, a throwback to how we named Oh Dear

We're obviously a little biased, but we believe we have one of the best website monitoring tools on the market today, leading in features compared to our competitors. We've already tried a variety of marketing techniques to promote our service, but none really had the impact we were looking for. Maybe we're better at actually building good software than we are at marketing it? Or are we trying what everyone else is also doing, thus making it all harder?

OpsHelm goes multi-cloud with Aiven Diskless BYOC, cuts costs by 78% over MSK

In under a month, OpsHelm the continuous, enriched changelog for cloud infrastructure - migrated its streaming backbone from MSK and NATS to Aiven Diskless Kafka (BYOC on AWS). The switch eliminated cross-cloud networking fees, collapsed multiple storage layers into one, and cut total streaming costs by 5x (from >$50,000/year to <$10,000/year) while serving the team a single logical event bus that stretches across multiple regions and accounts.

Cloud Microservices Monitoring on AWS and Azure with OpenTelemetry

Your checkout flow starts in an AWS Lambda function, calls a payment service running on EKS, then triggers notifications through Azure Functions. Three different compute platforms, two cloud providers, one distributed trace that you can't see. Cloud providers want you to use their native monitoring tools. AWS pushes X-Ray and CloudWatch. Azure promotes Application Insights and Azure Monitor. These tools work well within their ecosystems but lock you into vendor-specific implementations.