Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Let Your LLM Debug Using Production Recordings

Modern LLM coding agents are great at reading code, but they still make assumptions. When something breaks in production, those assumptions can slow you down—especially when the real issue lives in live traffic, API responses, or database behavior. In this post, I’ll walk through how to connect an MCP server to your LLM coding assistant so it can pull real production data on demand, validate its assumptions, and help you debug faster.

AI SRE in Practice: Resolving GPU Hardware Failures in Seconds

When a pod fails during a TensorFlow training job, the investigation usually starts with the obvious questions. The answers rarely come quickly, especially when the failure involves GPU hardware that most engineers don’t troubleshoot regularly. This scenario walks through an actual GPU hardware failure and shows how AI-augmented investigation changes both the time to resolution and the expertise required to handle it.

Cloud Strategy for 2026: the Year of Repatriation, Resilience, and Regional Rebalancing

This year is set to be a pivotal year for cloud strategy, with repatriation gaining momentum due to shifting legislative, geopolitical, and technological pressures. This trend has accelerated, with a growing focus on data sovereignty. These challenges have set the stage for 2026 to be the year of repatriation, resilience, and regional rebalancing. Here, Rob Coupland, Chief Executive Officer at Pulsant, offers his insights.

Speedscale vs. LocalStack for Realistic Mocks

API mocking plays a crucial role in modern software development allowing developers to simulate external API endpoints. It’s an effective way to isolate your application for testing and ensure that code changes don’t inadvertently break critical dependencies. Essentially, API mocking helps you create robust, reliable software by allowing you to test how your application interacts with external services.

How to Do Full-Text Search Across All Application Traffic with Speedscale

Modern DevOps observability tools are excellent for monitoring system health, tracking distributed traces, and aggregating metrics. However, they lack the fidelity needed for full-text search across application traffic. While observability platforms excel at showing what happened and when, they often fall short when you need to find where a specific piece of data (like an email address, user ID, or transaction token) appears as it flows through your entire application stack.

Top 7 Kubernetes Add-ons

The open-source Kubernetes platform is designed to help simplify application deployment through Linux containers. It supports tasks like deploying workloads in the form of pods, clustering nodes, managing container runtimes, and tracking resources. The Kubernetes microservices system has risen in popularity over the last several years as an easy way to support, scale, and manage applications.

How Standardizing Dev Workflows Boosts Velocity, Quality & Joy - with Jason Gates

What if your dev team loved their workflows? Jason Gates from Sandia National Labs joins GitKraken’s VP of Developer Research, Jeremy Castile, to unpack the real-world challenges and powerful benefits of developer workflow standardization. In this candid conversation, Jason shares lessons from helping dozens of teams improve their software delivery — from reducing friction and boosting velocity, to creating joyful, productive developer experiences. They dive into.

A buyer's guide to engineering intelligence platforms in 2026

You're in a planning meeting when someone asks a simple question. How long does it actually take your team to ship a feature? You've got spreadsheets, Git logs, and Jira exports scattered across three tabs, and you still can't give a confident answer. It's a question you should be able to answer instantly, but the data lives in too many places to stitch together on the fly.

AI coding assistants are only as good as the context you give them

AI coding assistants have quickly become part of everyday development. Teams now rely on them to explain unfamiliar code, suggest configuration files, debug errors, and accelerate delivery across the stack. But as these tools move from experimentation into real production workflows, a consistent pattern is emerging: AI breaks down at the platform boundary.