Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

When Your Observability Literally Stops Traffic

Last week, a fleet of autonomous robotaxis in China suddenly stopped working—at scale. Over a hundred vehicles stalled across a city, stranding passengers in traffic and raising immediate concerns about safety, reliability, and trust in autonomous systems. This wasn’t just a bad day for self-driving cars. It was a distributed systems failure, one that happened in the physical world, not just in dashboards.

OpenTelemetry Trace Testing for CI Release Gates

OpenTelemetry is great at answering one question: “what just broke?” The problem is that most teams need a different answer first: “what is about to break in this release?” That is where trace-based testing comes in, especially for teams running a vendor-neutral OTel stack (Collector + Tempo/Jaeger + Prometheus) and needing deterministic release gates.

From IC to VP: Engineering Leadership at Every Level, with Box's Tamar Bercovici

Cortex co-founder and CTO Ganesh Datta sits down with Tamar Bercovici, VP of Engineering at Box, who spent 15 years at the company growing from senior IC to leading its core platform organization, to talk about what engineering leadership looks like at each level of the org.

AI Enablement for Dev Teams: The 6-Pillar Flywheel

AI adoption is already happening on your team, whether you have a strategy or not. Tracy Lee (CEO of This Dot Labs, Microsoft MVP, Google Developer Expert) breaks down the AI Enablement Flywheel — a 6-pillar framework used by successful engineering organizations to move from scattered experimentation to scalable, ROI-positive AI workflows.

Rovo Chat in Bitbucket now understands your Pipelines

Why did your build fail? Ask Rovo, get a clear answer, and even a way to fix it, from anywhere in Bitbucket Pipeline debugging is one of the most common and most painful parts of the development workflow. In our Atlassian research: AI adoption is rising, but friction persists, over 50% of developers reported losing more than 10 hours each week searching for information, onboarding to new code, or toggling between apps.

Every engineering org is taking an AI readiness test right now

Tamar Bercovici has been at Box for 15 years. She leads the core platform, the backend layer that storage, search, metadata, and AI capabilities all run on. When her systems go down, Box goes down. On a recent episode of the Braintrust podcast, she said the debate around AI-generated code tends to focus on whether the models will write clean code and/or introduce bugs. Tamar's focus is somewhere else entirely.

Building a single pane of glass for enterprise Kubernetes fleets

A Kubernetes single pane of glass is a centralized management layer that unifies visibility, access control, cost allocation, and policy enforcement across § cluster in an enterprise fleet for all cloud providers. It replaces the fragmented practice of switching between AWS, GCP, and Azure consoles to govern infrastructure, giving platform teams a single source of truth for multi-cloud Kubernetes operations.

Load Testing Vs Stress Testing | Resilience Testing | Harness

Load testing and stress testing are two important parts of performance testing, but they serve very different purposes. Load testing checks how your application behaves when many users access it at the same time under normal or expected conditions. It helps you understand if your system can handle real-world traffic smoothly without slowing down.