Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

AI SRE in Practice: Enabling Non-Experts to Troubleshoot Kubernetes

Kubernetes troubleshooting traditionally requires deep platform expertise. Understanding pod lifecycle, decoding error messages, correlating events across resources, and identifying root cause all demand experience that takes years to build. This expertise gap creates a bottleneck where only senior engineers can handle production issues, limiting how quickly teams can resolve incidents.

How Gremlin makes disaster recovery testing easier and faster

There’s a common saying: “A backup isn’t a backup until you’ve tested it.” The same is true whether it’s a simple database failover or an entire data center/cloud provider failover. You simply won’t know if it works if you don’t test it. When it comes to disaster recovery testing, that can be an expensive, painful, and arduous process. But it’s required by companies for a reason. And not just for disasters like hurricanes, flooding, or earthquakes.

Beyond "Reactive" Accessibility: Meeting the 2026 ADA Title II Mandate in Higher Ed

For decades, digital accessibility in state-funded higher education has largely been a "reactive" game. If a student with a visual impairment reported an issue with a tuition portal, the university would scramble to provide an accommodation. As long as the institution could show "meaningful progress" toward compliance, it was generally shielded from significant legal repercussions. That era is officially ending. The U.S.

The post-mortem problem

Post-mortems are one of the most consistently underperforming rituals in software engineering. Most teams do them. Most teams know theirs aren't working. And most teams reach for the same diagnosis: the templates are too long, nobody has time, and nobody reads them anyway. These aren't wrong observations. But they're symptoms, not causes. The actual problem is that somewhere along the way, the post-mortem stopped being a piece of communication and became a compliance artifact.

How to Build AI-Native Security Resilience (And Finally Get Developers And Security On The Same Team) | Harness Blog

Developers and security professionals have struggled to get on the same page for what seems like forever and AI is only making that divide larger, according to results from our State of AI-Native Application Security 2025 research report.
Sponsored Post

The art of software engineering management

Like any leadership role, leading an engineering team in a mature, compact company like Raygun comes with both honor and responsibility. Leading a major development project is a bit like conducting a symphony orchestra, where every individual plays a crucial role and has a great impact on the work they release to customers and end-users.

Spring Boot API Testing: A Practical Guide for Enterprise Teams

Enterprise Spring Boot APIs should be tested at three levels: unit tests for business logic, integration tests for external service behavior, and traffic replay for production edge cases. Most teams only do the first. This guide shows all three using a real Spring Boot application that calls external APIs (SpaceX, US Treasury) with JWT authentication. The kind of service that looks simple in development and breaks in production.

Debugging Encrypted Microservice Traffic with Speedscale's eBPF Collector

Production bugs that only reproduce in actual traffic can be some of the most frustrating bugs in software development. You can stare at your logs, add traces to your code, add instrumentation – and still not be able to see the actual requests that went over the wire. And that gets even harder when the requests are encrypted and the system is a black box. You can use tools like Wireshark or Kubeshark to capture the requests.

The Ultimate Black Friday Technical Checklist: Prepare your infrastructure for Black Friday

Updated March 03, 2026 One of the things that keep online shop owners awake at night is – will my website withstand the Black Friday traffic? As this is one of the most important days of the year, a downtime of even a few minutes can translate into thousands of dollars in losses. This is why we’ve decided to come to your aid with a hands-on article where we discuss the most common Black Friday problems eCommerce websites should avoid, and how you can avoid them.