Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

RUM Versions: one click deployment tracking

Deployments should drive your product forward, not slow you down. Yet too often, teams spend hours digging through logs, dashboards, and error reports just to answer a simple question: did the release go smoothly? Coralogix’s new Versions feature answers this in a single click, letting teams spend more time building and less time investigating.

5 Notable Examples of Network Maps and Diagrams

A network map is a visual representation of the devices and connections that make up an IT network. For IT professionals, network maps are essential tools for monitoring performance, troubleshooting issues, enhancing security and planning infrastructure upgrades. There are multiple types of network maps, each serving a specific purpose, ranging from physical layout diagrams to cloud-based and security-oriented architectures.

Introducing new issue detectors: Spot latency, overfetching, and unsafe queries early

Not everything in production is on fire. Sometimes it’s just... a little warm. A page that loads a second too slow. An API that returns way more than anyone asked for. A query that feels totally fine until someone sends something unexpected and suddenly you’ve got an incident.

Is Your "Single Pane of Glass" Leaving You Blind to the Real Problem?

In the push to simplify IT management, the idea of a single, all-encompassing AIOps platform is certainly appealing. The promise of one dashboard to monitor the entire IT stack—from applications and infrastructure to the network—suggests a world of streamlined operations. This generalist approach aims to provide a broad overview, correlating data from across the business to spot trends and potential issues.

What's New in InfluxDB 3.3: Managed Plugins, Explorer Updates, and More

InfluxDB 3.3 is now available for both Core and Enterprise, which introduces new managed plugins for the Processing Engine, making it easier to address common time series tasks with just a plugin. On top of that, 3.3 includes a wide range of performance improvements, feature updates, and bug fixes. InfluxDB 3 Core is free and open source, optimized for recent data, and licensed under MIT and Apache 2.

Building an Incident Response Playbook: Templates and Examples

An incident response playbook is your team's emergency manual when things go wrong. It's a documented set of procedures that guides your team through detecting, responding to, and resolving incidents efficiently. Without one, teams often scramble during outages, make inconsistent decisions, and take longer to restore service.

Azure native integration elevates Elastic Cloud Serverless experience

We're thrilled to announce a significant leap forward in making Elastic Cloud Serverless even more accessible and powerful for Azure users. With the general availability (GA) of Elastic Cloud Serverless on Azure, we've just released the Azure native integration for Elastic Cloud Serverless. This builds upon our existing Azure native integration for Elastic Cloud Hosted, allowing users to seamlessly discover and manage Elastic Cloud in a way that feels inherently part of the Azure ecosystem.

Bring high-performance observability to secure Kubernetes environments with Datadog's new CSI driver

In Kubernetes environments, applications often communicate with the Datadog Agent to send telemetry data such as custom metrics via DogStatsD or traces through Datadog APM. How this communication takes place depends on the communication mode set on the Datadog Cluster Agent's Admission Controller. With the sockets option, communication takes place through local inter-process communication via Unix domain sockets (UDS), whereas the service and default hostip options rely on network communication.

Datadog Disaster Recovery mitigates cloud provider outages

A loss in infrastructure and applications observability can leave SRE and DevOps teams without insight into the real-time state of their production systems, causing them to temporarily pause code deployments and limit their ability to troubleshoot issues or respond to critical alerts. In modern cloud environments, where services are distributed and deeply interconnected, this lack of visibility can escalate quickly.