Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

What Are Traces? A Developer's Guide to Distributed Tracing

One of the most common challenges in modern software engineering today is understanding how requests flow through applications. As system architectures shift to favor widely distributed, cloud-native designs, keeping track of how an application processes user actions is more difficult than ever. A single user action may trigger events processed in dozens of backend services. Traces are helping software developers today with this challenge.

Datadog named Leader in 2025 Gartner Magic Quadrant for Observability Platforms

We are thrilled to announce that, for the fifth consecutive year, Datadog has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms. We believe that this recognition reflects our continued focus on helping customers observe, secure, and act on everything that matters across their technology stack.

What is Log Loss and Cross-Entropy

You're building a classification model, and your framework throws around terms like "log loss" and "cross-entropy loss." Are they the same thing? When should you use binary cross-entropy versus categorical cross-entropy? What about focal loss? This blog breaks down these loss functions with practical examples and real-world implementations.

Cloud Log Management: A Developer's Guide to Scalable Observability

As systems move to microservices, serverless, and multi-cloud setups, debugging gets harder. You’re no longer dealing with a single log file; you’re looking at logs from dozens of services, running across different environments. Traditional debugging methods like SSH-ing into servers or adding print statements don’t scale in these environments. Cloud log management tools help by collecting logs from all your services into one place.

How to improve your observability

Coroot was designed to solve the problem of time-consuming root cause analysis. It handles the full observability journey - from collecting telemetry automatically with zero code setup (thanks, eBPF!) to simplifying the role of SREs and DevOps everywhere with instant root cause analysis powered by AI. We also strongly believe that simple observability should be an innovation everyone can afford to benefit from: which is why our software is open source!

How We Made Our Queries 99.5% Faster

We cut log-query scanning from ~100% of data blocks to < 1% by reorganizing how logs are stored in ClickHouse. Instead of relying on bloom-filter skip indexes, they generate a deterministic “resource fingerprint” (hash of cluster + namespace + pod, etc.) for every log source and sort the table by this fingerprint in the primary-key ORDER BY clause. This packs logs from the same pod/service contiguously, letting ClickHouse’s sparse primary-key index skip irrelevant blocks.

Visibility Is the First Line of Defense: Operational Readiness in a Zero Trust World

As global cyber threats continue to evolve at unprecedented speed, the United States public sector faces growing pressure to enhance operational readiness. Agencies must now contend with adversaries who are not only well-funded but also increasingly sophisticated in their ability to exploit visibility gaps. In the face of this dynamic threat landscape, the Zero Trust Architecture (ZTA) model has become an essential security framework.

Getting started with ElasticSearch dashboards

ElasticSearch is one of the IT and software industry’s most established platforms for storing and analyzing log data. As its name suggests it also has a powerful search and analytics engine based on the ElasticSearch Query language. ElasticSearch itself is essentially a backend store, so if you want to explore and analyze your data, you will need a visualization layer such as SquaredUp and our ElasticSearch PlugIn.

Top 3 reporting tools for Microsoft Teams: SquaredUp, Power BI & M365 Admin Center

Microsoft Teams is a ubiquitous presence in workplaces all over the world. Prior to 2020, its usage was relatively moderate, with around 20 million users. However, global restrictions during the pandemic led to a 3,500% growth. Teams is now so central to business operations that Microsoft retired Skype in its favor. But this massive scale created a new problem – businesses needed better ways to monitor and report on their Teams usage.