Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How Prometheus Exporters Work With OpenTelemetry

Running distributed systems means you need clear visibility into how your services behave. Prometheus has been the standard for metrics for a long time, and OpenTelemetry is now giving teams a more consistent way to collect telemetry across their stack. In many setups, you'll have both: existing Prometheus instrumentation that's already in place, and new components instrumented with OpenTelemetry.

Key learnings from the State of Containers and Serverless report

We recently released the 2025 State of Containers and Serverless report, which examines cloud usage data from tens of thousands of Datadog customers. The study shows adoption trends across container orchestration platforms and serverless offerings, and it explores how organizations use those resources to optimize workloads for efficiency, cost, and simplicity.

Catch and remediate ECS issues faster with default monitors and the ECS Explorer

Organizations that run applications on Amazon Elastic Container Service (Amazon ECS) often juggle signals across container and task metrics, logs, and events while they hunt for the change or condition that broke a deployment. This work adds operational overhead and extends incident timelines as teams switch between tools and manually correlate symptoms.

Import Snowflake, Salesforce, ServiceNow, and Databricks metadata into Datadog with Reference Tables

Engineering, operations, and security teams can struggle to make sense of their telemetry data in isolation. Logs, metrics, and events tell what is happening but are often missing critical metadata like who owns what, where it's coming from, or indicators of attack. These gaps in visibility slow down incident response, complicate cost control, and make business or security analytics much harder.

Why Email Blacklist Monitoring Matters?

Email deliverability determines whether your messages reach inboxes or disappear without notice. When your domain or mail server appears on a blacklist, communication stops instantly, affecting customers, partners, and revenue. Blacklisting can happen silently, even to legitimate senders. Continuous email blacklist monitoring ensures that issues are detected early, keeping your reputation strong and your communication uninterrupted.

When payments pause: lessons from a global payments outage

In digital commerce, payment reliability is non-negotiable. The rise of instant payments highlights this need: global instant payment transaction volume reached 195 billion in 2022, with projections to surpass 500 billion transactions by 2027 as more countries adopt faster payment systems. This growing reliance on real-time payment rails raises the stakes for reliability, with any disruption posing major risks to trust and revenue.

Atatus 2025 Highlights: G2 Wins and Product Milestones

As we approach 2026, we’re taking a moment at Atatus to reflect on a year that pushed us forward in every way. 2025 was about raising the bar by expanding integrations, deepening data insights, broadening language support, and rolling out new capabilities that empower teams to see more and do more. Most importantly, the response from our customers and community made it clear that the work we’re doing is making a real difference.

How to Use MetricFire Logging: Visualize Logs & Metrics Together in Grafana

Want full visibility into your systems? In this step-by-step tutorial, we show you how to use Grafana Loki with Promtail on Hosted Graphite by MetricFire to stream logs alongside your metrics. All visualized in Grafana dashboards. No more toggling between tools — get the full observability stack in one place.

Choosing the Right Load Balancing Approach for Your Network: Static, Dynamic, & Advanced Techniques

Load Balancing is the process of distributing network traffic among multiple server resources. The objective of load balancing is to optimize certain network operations. Ensuring that a workload is spread evenly among the computing resources, this “balanced load” improves application responsiveness and accommodates unexpected traffic spikes — all without compromising application performance. Let’s take a deeper look at this important networking function.