Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

A Fresh Look Without Moving the Cheese

After 12 years of faithful service, the TrackJS interface was starting to show its age. Not that it wasn’t working—it was still doing exactly what our customers needed it to do. But when you’re staring at Bootstrap styles from 2012 and a version of LESS that might be officially defunct, it’s probably time for a refresh.

Discover powerful insights with nested metric queries

To gain adequate visibility into your distributed applications, you need to observe those applications at different levels of granularity. This means that you need to be able to query collected telemetry data both at the level of the whole application and at the level of selected components. Thanks to the power of Datadog tagging, you can already do this by aggregating your metrics within any scope of your choosing.

How to import Prometheus-style alerts and recording rules to Grafana-managed alerts and recording rules

Grafana Alerting has evolved dramatically since the legacy dashboard-alert days. Today, Grafana-managed alerts power enterprise-scale monitoring in Grafana Cloud and on-prem installations. And over the last two years, we’ve added RBAC, state history, versioning, and much more. At the same time, our own monitoring at Grafana Labs relies heavily on Prometheus-style alerts—a situation that’s not uncommon for our users, too.

From Alert to Fix in 10 minutes: How a Slow Query Took Down Placid.app

This is a guest post from Armin Ulrich, a fullstack developer, and founder of placid.app. He also created the MadeWith* network where he shares his projects and allows other developers to share theirs. There are many things I would rather do at 9pm than tracking down a mission-critical bug, but sometimes you don’t have a choice. Let me tell you the story about a slow query that led to a cascading failure–and how it could have been worse.

3 AIOps Trends for 2025 #aiops

As IT environments grow more complex, teams need smarter, faster ways to stay in control. In 2025, three trends are redefining how modern IT operations teams drive efficiency and resilience: Automation Everywhere: Offload routine tasks with intelligent workflows Predictive Everything: Spot and resolve issues before they impact users AI + Human Collaboration: Empower teams with real-time, AI-driven insights.

Brand email with your logo

StatusGator supports custom email branding on our Enterprise plan and as an add-on to other plans, allowing your customers or end-users to get an email that has your organization logo and sends from your organization’s email address. Previously, this email logo used the same image as your status page. Now, you can upload a custom logo to be used just for your emails. Enjoy improved branding by uploading a logo that fits the email perfectly.

Simplifying Observability: Streamlining Telemetry with a Centralized Pipeline

Modern applications generate a deluge of telemetry data—logs, metrics, and traces—that hold the key to understanding system performance and reliability. However, managing this data effectively is a growing challenge for DevOps teams. Raw telemetry can overwhelm teams with complexity and noise even when collected via robust standards like OpenTelemetry.