Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Monitoring AI Proxies to optimize performance and costs

Businesses deploying LLM workloads increasingly rely on LLM proxies (also known as LLM gateways) to simplify model integration and governance. Proxies provide a centralized interface across LLM providers, govern model access and usage, and apply compliance safeguards for smoother operations and reduced complexity—making LLM usage more consistent and scalable.

How we use RUM to make design decisions that enhance user experience

Before we started using Datadog Real User Monitoring (RUM), we relied on frontend logging to gather data about the user experience. Logs gave us some helpful information about exceptions and errors but didn't provide any insight into issues directly related to the user’s perspective.

Turning Network Telemetry into Network Intelligence

By applying data engineering and machine learning to raw network telemetry, it’s possible to surface insights that would otherwise go unnoticed. Learn how this approach helps teams detect anomalies in real time, forecast capacity needs, and automate responses across complex, multi-domain environments.

IT Performance Challenges: Why They Persist-and How to Solve Them for Good

IT Ops Problem Solver Series – Part 2: This article is a summary of a full report in our IT Ops Problem Solver Series. In this series, we’ll tackle the biggest problems facing IT Ops leaders and explore how some of Galileo’s clients are addressing them. In this part of the series, we delve into IT performance challenges and how to address them effectively.

Getting Started with SolarWinds Orion Dashboards

SolarWinds is a popular IT infrastructure monitoring tool deployed on-prem, most well-known for its network and server monitoring capabilities. While it offers rich telemetry, it’s easy to miss the bigger picture. SquaredUp turns this complex monitoring data into clear, shareable dashboards that make it easier to spot trends, catch issues early, and keep everyone on the same page.

Evaluating Synthetic Monitoring Platforms: What to Look for in 2025

Synthetic monitoring simulates user interactions with applications to proactively identify performance issues before they impact real users. Modern distributed systems require sophisticated monitoring capabilities to effectively test microservices, APIs, and complex user journeys across diverse environments. This article provides a framework to evaluate synthetic monitoring platforms in 2025.

Guide to Monitoring Apache Flink Using OpenTelemetry and MetricFire

Apache Flink is an open-source, distributed stream processing engine built for real-time, high-throughput data pipelines. It excels at processing continuous data streams with low latency, making it a great fit for use cases like fraud detection, log analytics, real-time dashboards, personalized recommendations, and IoT telemetry.

AI's Unrealized Potential: Honeycomb and DORA on Smarter, More Reliable Development with LLMs

Charity Majors, CTO and Co-founder at Honeycomb, and Phillip Carter, Principal Product Manager at Honeycomb, recently hosted a webinar with DORA's Nathen Harvey on AI's unrealized potential. As part of this, we created a 3-minute highlight reel of the webinar that you can watch.

Why a No-Index Observability Architecture is Essential

When was the last time you asked about the architecture behind your observability provider? For most IT professionals whether in development, operations, or security, it’s not a question that naturally comes up. Yet, this architectural detail could be the difference between insight at scale and runaway costs. People are drawn to the features, the shiny things. They promise to unlock insight, drive faster response times, and tighten security.