Operations | Monitoring | ITSM | DevOps | Cloud

What is operational excellence?

Engineering teams are great at innovating and delivering products, but the work that's required to maintain them over time and keep them running well tends to get deprioritized. Planning processes are designed to move features forward, not to catch whether those features are generating too many alerts, degrading in performance, or creating compliance exposure over time. As a result, that class of work accumulates quietly.

Cortex and Syntasso join forces to bridge the gap between automation and visibility

I've spent a lot of time talking to platform teams who feel like they're running in circles. They build incredible automation to speed up service delivery, but even when it's running perfectly, nobody actually knows what's happening across the organization. It's hard to see who owns which service or if those services even meet basic company standards. Automation's a great start, but it usually hits a wall when you try to scale it.

How to stop guessing where developer friction lives

Most platform teams know friction is a problem. They also struggle to figure out exactly where that friction lives. Developers lose time in ways that rarely show up on a roadmap. In many organizations, creating a new service can require multiple approvals and several Slack threads. Spinning up infrastructure can mean filing a ticket and waiting days. Onboarding to a new codebase involves a scavenger hunt through stale Confluence pages. None of these feel like emergencies in isolation.

What is engineering operations? A guide to the discipline transforming software teams

Engineering teams are writing more code than ever. AI coding tools have made individual developers dramatically more productive, yet most organizations report moving only about 20% faster than before. The real constraint has always been the operational fabric surrounding the act of writing code. The processes, standards, visibility, and coordination that determine whether hundreds of engineers and thousands of services ship reliable software at speed have always been where the real work happens.

Debugging Encrypted Microservice Traffic with Speedscale's eBPF Collector

Production bugs that only reproduce in actual traffic can be some of the most frustrating bugs in software development. You can stare at your logs, add traces to your code, add instrumentation – and still not be able to see the actual requests that went over the wire. And that gets even harder when the requests are encrypted and the system is a black box. You can use tools like Wireshark or Kubeshark to capture the requests.

Introducing Cortex as the Engineering Operations Platform

Software Engineering is once again being forced to evolve. We are entering the era of infinite code where the cost of writing code tends to zero. The data tells us that companies are only moving 20% faster than when humans wrote code by hand. We’re writing orders of magnitude more code than ever, yet our processes are barely keeping up with what we had before. The chaos and complexity is only being amplified by this new shift in how we work as developers.

Why measuring things openly is the first step toward a stronger engineering culture

Most engineering leaders know they should be measuring more. What holds many of them back is a quieter concern about whether the organization is actually ready to see the numbers. This tension, however, did not keep Ganesh Datta, our co-founder and CTO, and Randy Shoup, SVP of Engineering at Thrive Market, from diving down this rabbit hole on the Braintrust podcast.

Why business context is the missing link in engineering performance

Think about the last time your team shipped something impressive. It was probably on time, clean code, and had great metrics. And yet somewhere along the way, the business priorities had shifted, and what the team delivered was no longer the top priority. The work was solid, but the direction just wasn't quite right anymore. This is usually what happens when engineers are disconnected from business context.

Breaking up with backstage: Why "free" open source isn't always free

We’ve all had that moment where it seems like you've solved your company's biggest engineering challenges after a weekend of hacking something together. Your prototype is so good, you feel, that the obvious next steps are to build a slide deck, rally the team around your work, and prepare the ticker tape parade for your hero's welcome. Jeff Schnitter, a Solution Architect at Cortex, knows this roller coaster of experience all too well after his time at Workday.

Troubleshooting Microservices with OpenTelemetry Distributed Tracing

Distributed tracing doesn’t just show you what happened. It shows you why things broke. While logs tell you a service returned a 500 error and metrics show latency spiked, only traces reveal the full chain of causation: the upstream timeout that triggered a retry storm, the N+1 query pattern that saturated your connection pool, or the missing cache hit that turned a 50ms call into a 3-second database roundtrip.