Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Support for Oracle Cloud Infrastructure (OCI) is now live in Datadog Cloud Cost Management. In this short demo, you’ll learn how to: Get granular visibility into OCI cost and usage—by service, compartment, tag, and resource tier. Uncover savings opportunities by combining cost data with observability metrics like CPU, memory, and storage utilization. Set up anomaly monitors and budgets to avoid cost overruns—especially for high-risk workloads like AI and GPU training.

Datadog Bits AI SRE: Your new teammate for on-call shifts

Bits AI SRE is an always-on SRE agent built to handle complex troubleshooting and late-night alerts. Developed against thousands of real-world incidents and powered by Datadog’s platform, Bits AI SRE analyzes your entire stack, tests hypotheses, and identifies root causes in minutes. Resolve faster, get back to sleep sooner, and give your on-call team the confidence and capacity they need.

Patterns for Deploying OpenTelemetry Collector at Scale

So, you've embraced OpenTelemetry, and it's been great. Pat, Pat. That single, vendor-neutral pipeline for your traces, metrics, and logs felt like the future. But now, the future is getting bigger. That simple OTel Collector configuration that worked perfectly for a few services is starting to show its limits as you scale. The data volume is climbing, reliability is becoming a concern, and you're wondering if that single collector instance is now a bottleneck waiting to happen.

Amazon AppStream 2.0 Multi-session Service Monitoring

In late 2023, Amazon introduced the ability to deliver AppStream 2.0 using Microsoft Windows Server OS rather than the desktop of the OS. This feature enables IT admins to host multiple end-user sessions on a single AppStream 2.0 instance, helping to make better use of instance resources.

Golang Monitoring Guide - Traces, Logs, APM and Go Runtime Metrics

Golang (Go) applications are known for their high performance, concurrency model, and efficient resource use, making Go an easy choice for building modern distributed systems. But just because your Go application is built for speed doesn't mean it's running perfectly in production. When things go wrong, just checking if your service is "UP" isn't enough.

Beginner's Guide to OpenTelemetry & Django (2025)

Django is a popular open-source "batteries-included" Python web framework that enables rapid development while taking out much of the hassle from routine web development. By providing pre-built components like ORM integrations, authentication/authorization systems and more, it enables developers to focus on business logic and iterate fast. As such, developers and organizations worldwide use Django to build web apps of varying complexities.

What is OpenTelemetry? [Everything You Need to Know]

Observability used to be a fragmented mess. You had one agent for logs, a different library for metrics, and a proprietary SDK for distributed tracing. If you wanted to switch vendors, you had to rewrite your instrumentation code from scratch. OpenTelemetry (OTel) fixed this. It has become the second most active project in the CNCF (Cloud Native Computing Foundation), right behind Kubernetes.