Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Increase control and reduce noise in your AWS logs using Datadog Observability Pipelines

Today’s SRE and security operations center (SOC) teams often find themselves overwhelmed by the sheer volume and variety of logs generated by critical AWS services such as VPC Flow Logs, AWS WAF, and Amazon CloudFront. While these logs can be valuable for detecting and investigating security threats, as well as troubleshooting issues in your environment, managing them at scale can be challenging and costly.

Breaking Free from Legacy Observability: Why Service Providers Choose Kentik Over Deepfield

Modern network operators need modern observability tools. In this post, we explore why Deepfield — a traditional network flow analytics platform — falls short in providing comprehensive insights required for today’s network operations, and how Kentik’s modern data platform is purpose-built for today’s infrastructure teams.

How Forbes delivers a premium digital experience with Datadog

Learn how Forbes, a global media powerhouse, successfully migrated to the cloud with Datadog. Discover how they enabled their teams across their entire tech stack to access IT data and make critical improvements. The team maintained a 99.5 percent uptime through proactive alerting and improved root cause analysis by 10 percent.

Making sure you get a Checkly alert for every detected failure

It’s every ops team’s biggest anxiety: a monitoring system detects a failure, but the notification either isn’t delivered or isn’t noticed by the team. Now we have to wait for users to complain before our team knows about the problem. Checkly sends an alert every time the system detects a failure, but how can you be sure you’re getting those alerts, and that those alerts are going to the right people?

How to Use OpenSearch with Python for Search and Analytics

If you're working with search and analytics, you’ve probably heard about OpenSearch—the open-source alternative to Elasticsearch. OpenSearch is a powerful tool, whether you're building a search engine, running log analytics, or implementing full-text search in your applications. And the best part? You can integrate it easily with Python.

OpenTelemetry Visualization Setup: A Developer's Guide

If you've ever tried to set up OpenTelemetry visualization, you know it can be a bit overwhelming. But don't worry—in this guide, we'll break it all down step by step. Whether you're just getting started or looking to fine-tune your existing setup, this walkthrough will help you get the most out of your telemetry data.

It was DNS Again: Why Your Status Page Needs Its Own Domain

On February 20, 2025, at 16:22 UTC, StatusGator detected an outage affecting Vultr. The issue appeared to stem from a DNS failure, causing vultr.com and any other services hosted on its domain to become inaccessible. But what does that include? The official Vultr status page. Because Vultr hosts its status page on status.vultr.com, the same domain hosting its primary website and dashboard, users were left without an official source of updates during the outage.

Getting Ready with Regex 101

If you’ve dropped your house key in tall grass, you know how difficult it is to locate a small item hiding in an overgrown field. Perhaps, you borrowed a metal detector from a friend, then returned to the field hoping to get the loud beep that indicates finding metal in an otherwise organic area. Trying to find patterns in strings of data is the same process.