Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

A New Scale Tier for Amazon Timestream for InfluxDB

InfluxDB 3 on Amazon Timestream for InfluxDB now scales to 15-node clusters, unlocking higher ingestion, greater query concurrency, and real-time performance at scale. In this video, PM Pete Barnett breaks down what this means for high-resolution, high-velocity workloads, and how you can scale from Core to Enterprise with zero downtime or data migration.

Cost Optimization in Action: How We Cut Amazon SQS Costs by 87%

JC, the Director of Software Engineering, Cloud at LogicMonitor, shares how Cost Optimization enabled his team to shift to Cost-Intelligent Observability and tackle an unexpected and growing cloud bill. As engineers, we live and breathe performance. We obsess over latency, reliability, and uptime, the hallmarks of a healthy system. But there’s another metric that’s becoming just as critical: cost.

Monitor your application and network load balancer logs

Load balancers are the primary entry points to distributed applications. By strategically directing the flow of incoming web traffic to specific endpoints, load balancers help optimize throughput and ensure the horizontal scalability of applications. In modern systems, load balancers often do more than their name suggests: Beyond basic load distribution, they analyze requests and route traffic based on a wide range of variables, such as client identity.

Event Intelligence for Agentic IT Operations

Modern IT teams are experimenting with AI agents. But individual agents, working in isolation are not enough. To truly achieve Agentic IT Operations, organisations need a platform — one that coordinates, governs, and contextualises AI-driven actions across the entire IT landscape. That’s where Interlink Software comes in.

Instrumenting Rust TLS with eBPF

Coroot is an open source observability tool that uses eBPF to collect telemetry directly from applications and infrastructure. One of the things it does is capture L7 traffic from TLS connections without any code changes, by hooking into TLS libraries and syscalls. Works great for OpenSSL. Works for Go. Then rustls enters the picture and everything stops being obvious. With OpenSSL, everything is nicely wrapped: From eBPF’s point of view this is perfect: Everything happens inside one call.

Shifting Metrics Right

In the shift left era where it feels like we’re pushing everything as far to the start of the SDLC as we can, it may seem counterintuitive to shift anything right. That is, however, exactly what I suggest when it comes to generating metrics. How far you go to the right of the SDLC is a much more nuanced question and is dependent on a lot of factors, and on what metrics you’re talking about.

The Hidden Crisis in Modern IT: Interpretation Risk

Technology leaders spent the past decade investing heavily in visibility. They expanded monitoring footprints, adopted cloud-native observability tools, integrated analytics dashboards, and layered on automation intended to streamline detection. Every addition promised deeper insight. Every initiative aimed to bring clarity to increasingly complex environments. Yet operations feel more chaotic, not less. Outages move faster. Incidents cross more boundaries. Signals appear without context.

Fair Source Software in the AI age

Have you noticed AI recently? Yeah, us too. Generative AI is wreaking havoc on the software status quo, and that includes licensing, and that generates … opinions. Sentry has a long history of having opinions about software licensing. We started life as an unlicensed side project in 2008, then went through BSD, to BSL, to writing our own license, FSL.

How GDIT Automated Early Response to Preserve Critical Event Context

In this video, Jason Boig, Solutions Engineer at GDIT, shares how his team uses ScienceLogic to streamline network infrastructure monitoring and improve response times. Instead of relying on manual processes after an alert is triggered, ScienceLogic helps automate the initial response and capture critical data the moment an event occurs. This ensures nothing is lost as conditions change and gives teams immediate visibility into issues.