Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

How Roblox uses HAProxy Enterprise to power gaming for 100 million daily users

One of the most anticipated presentations at HAProxyConf 2025 came from gaming and user-generated content (UGC) innovators Roblox. Software Engineer Chris Jones and Senior Site Reliability Engineer Ben Meidel gave an enthusiastic and enjoyable presentation, detailing their journey from legacy hardware to a sophisticated, automated, and secure application delivery platform, with seamless, API-powered dynamic configuration and upgrades, supported by the HAProxy Enterprise Dynamic Update Module.

New Feature Friday: Cortex & AWS

Most teams treat AWS like a black box. Cortex turns the lights on. We now automatically ingest all your AWS resources—from Lambda to RDS—and map them to the services and teams that actually own them. Daily. Automatically. No spreadsheets. No guesswork. Scorecards help you enforce real standards (think: runtime upgrades, tagging hygiene, EOL migrations). Workflows help your engineers self-serve AWS resources without needing to be AWS experts.

Welcome to the Next Frontier: AI on Kubernetes

Last week’s KubeCon Atlanta made one thing abundantly clear, Kubernetes is quickly becoming the de facto platform for AI workloads – with the event lineup chock full of talks, workshops, and even co-located events dedicated to AI, machine learning and running data on Kubernetes natively – with approximately 50 (!) sessions in total focused on AI, ML, LLM, and GenAI topics.. What was until now mostly PoCs and aspirational is now truly delivering in production.

Resolve's Zero Ticket Minute - Ep. 2 #itautomation #aiautomation #servicemanagement

Last month, Azure + AWS outages spiked global incidents by 250%. Help desks lit up fast. Zero Ticket IT keeps teams steady with proactive updates and instant deflection of those “is it down?” floods.# Don’t miss your 60-second IT news hit.

VirtualMetric DataStream + Elasticsearch: A Smarter Way to Send Logs to Elastic

Elasticsearch has long been the backbone of security analytics for organizations that need fast search, flexible dashboards, and scalable visibility across massive datasets. It powers everything from threat hunting to compliance reporting and real-time investigation. But anyone who has operated Elasticsearch at scale also knows a quiet truth: Elasticsearch is only as strong as the data you feed it. And getting clean, consistent, usable telemetry into Elastic is often the hardest part.

Incident Postmortem: How to Learn From Failures and Build Reliable Systems

When the issue settles, and systems are back, one question always remains: What actually happened, and how do we stop it from happening again? That’s where incident postmortems come in. Not just as documentation, but as a structured way to learn, improve reliability, and replace guessing with clarity. A good postmortem isn’t about blame, heroics, or perfect narratives. It’s about truth, learning, and building systems that get stronger with every failure.

7 Common Incident Response Challenges and How to Overcome Them

Incident response teams deal with several challenges. Alert noise, unclear ownership, lack of automation, and more. It’s important to keep an eye on these challenges and resolve them from time to time because they can turn minor issues into major outages. In this blog, we’ll discuss some of the common incident response challenges, how they affect, and how you can resolve them. Let’s dive in!

Incident Response Team: Roles, Responsibilities, and Structure Explained

Incidents don’t wait. They hit production, disrupt users, and pull teams into long recovery cycles. And a well-structured incident response team helps you move fast, limit damage, and restore services without chaos. In this blog, we’ll explain what an incident response team is, its key functions, team composition, and different types of teams. Let’s get started!

New in Redgate Flyway Enterprise - Drift detection and rollbacks just got easier

In our latest Redgate Flyway Enterprise release, you can store a snapshot directly in the target database, making drift detection and rollback strategies easier and more reliable whether you’re using state-based or migrations-based deployments.

Instrument Jenkins With OpenTelemetry

You can instrument Jenkins with OpenTelemetry using the official plugin and an OpenTelemetry Collector, then send the data to a backend like Last9 to understand where pipeline latency and failures actually originate. Jenkins provides job status and console logs, but it doesn't show how time is distributed across stages, agents, plugins, and external systems. OpenTelemetry fills that gap by emitting traces, metrics, and logs in a standard format that any OTLP-compatible backend can process.