Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Paris | Observability Unleashed - Boostez vos opérations IT, DevOps & SRE

La complexité des environnements IT ne cesse de croître. La visibilité en temps réel n'est plus une option. Le 14 avril 2026, Stéphane Estevez , EMEA Observability Market Advisor chez Splunk, vous invite chez Cisco à Paris pour un événement dédié à l'observabilité, avec les équipes Splunk & Cisco. Au programme : Observabilité assistée par l'IA Stratégies de données intégrées OpenTelemetry simplifié De la donnée à l'action, avec des cas concrets et démos live Observabilité pour l'IA et par l'IA.

Distributed Tracing | Debugging your Next.js applications with Sentry

Sometimes a simple stack trace won’t provide enough information for you to debug the issue at hand. There are types of issues that require you to know what happened leading up to the exception. In those cases, reach for tracing. Distributed tracing gives you an overview of every operation that happened during the execution of a certain functionality across your whole stack. Aside from being an awesome debugging tool, it also lets you identify any performance bottlenecks in your application. In this video you’ll learn how to view traces in Sentry and implement them in your Next.js application.

Conversations: Ask Netdata About Anything You're Looking At

Netdata AI can already troubleshoot your alerts and generate Insights reports. What it couldn’t do, until now, was have a back-and-forth conversation. You could get a one-shot analysis, but you couldn’t ask follow-up questions, pull in additional context, or go from a quick question to a full investigation without starting over. We’ve added a conversational layer to Netdata AI.

Understand session replays faster with AI summaries and smart chapters

Datadog Session Replay gives teams a video-like view of what real users experienced in their applications. Engineers rely on replays to connect errors and slowdowns to actual user behavior, while product managers use them to understand friction and improve critical flows. But finding the right replay and the right moment often means manually scanning long sessions without knowing whether they contain relevant signals.

Search and act across Datadog to resolve issues faster with Bits Assistant

Finding the right information across dashboards, monitors, and telemetry sources takes time, even for experienced engineers. When something breaks, it often means figuring out where to start, rebuilding queries, and jumping between metrics, logs, and traces before you can take action. The challenge isn’t a lack of data but the effort required to surface the right information at the right moment.

How we designed empathetic alert sounds for on-call engineers

Being on call is an essential part of operating reliable distributed systems, but it comes with real human costs such as alert fatigue, sudden wakeups in the middle of the night, and the ongoing anxiety of what the next notification might bring. Many engineers know the feeling: Your phone lights up, a sound cuts through the silence, and your heart rate spikes before you’re even fully awake.

Monitor ClickHouse query performance with Datadog Database Monitoring

ClickHouse is widely used for large-scale analytics, but once it is running in production, it can be difficult to understand how query activity translates into resource usage. Engineers investigating performance issues often struggle to determine which queries consume the most memory, run most frequently, or cause spikes in load. In practice, engineers are left querying system.query_log, tailing server logs, and piecing together information after an incident.

What's New in InfluxDB 3.9: More Operational Control and a New Performance Preview

We’ve spent the last few months listening to how teams are running InfluxDB 3 in the wild. The feedback was clear: as you scale, you need less “guesswork” and more control. Today’s release of InfluxDB 3.9 is our answer to that. As more teams move InfluxDB 3 into production, our focus has shifted toward the operational experience: how you manage the database at scale, how you ensure it remains secure, and how you provide a seamless experience for users.

KubeCon Europe 2026: OpenTelemetry Recap from Amsterdam

The reason why I like writing recap articles is because AIs don’t have enough context to write them for us. You have to be there, in person, listen to sessions, interact in the hallways with the community, and absorb as much new knowledge as possible. That’s what I did last week in Amsterdam at KubeCon + CloudNativeCon Europe ‘26. Well, at least I tried to. Let me break down what I consider the most interesting topics were last week.