Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Now you can use Sentry Insights to trigger alerts and debug issues

You deploy a fix late Friday and spend the weekend refreshing dashboards, hoping nothing breaks. You shouldn’t have to babysit a dashboard to know when something’s wrong. With the latest updates to Insights, you can now create alerts directly from any chart. Whether it’s a spike in 4xx errors after a deploy, a jump in P95 latency for an API endpoint, or a drop in throughput for a background job, you can set up alerts with just two clicks.

Trace Distributed Map states for AWS Step Functions with Datadog

AWS Step Functions offers the Distributed Map state, enabling you to coordinate massively parallel workloads within your serverless applications. With this feature, a single Step Functions execution can fan out into up to 10,000 parallel workflows simultaneously, making it possible to efficiently process millions of items in parallel. This capability unlocks new possibilities for large-scale data processing, such as image transformation, log ingestion, or batch analytics.

What is log tagging and how to configure it in Site24x7

In this video, learn what is Site24x7's log tag and how to configure, categorize, filter, and monitor your logs more effectively—so you can create your custom log tag that gives you full visibility into your logs or categorize them even better. Here’s what you’ll learn: Whether you're an IT personnel, DevOps engineer, or security analyst, this video will help you make smarter tags for monitoring decisions.

Infrastructure monitoring with Site24x7 | Cloud, Kubernetes, and Hybrid Environments

Modern IT environments are dynamic, distributed, and constantly evolving. You need more than traditional monitoring to keep everything running smoothly. Site24x7 is your all-in-one, AI-powered infrastructure monitoring solution. What this video covers: Whether you're overseeing AWS, Azure, GCP, OCI, VMware, or Kubernetes, Site24x7 simplifies it all with a single agent and AI-driven insights.

Grafana Cloud updates: The latest features in Kubernetes Monitoring, Fleet Management, and more

We consistently roll out helpful updates and fun features in Grafana Cloud, our fully managed observability platform powered by the open source Grafana LGTM Stack ( Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics). In case you missed them, here’s our monthly round-up of the latest and greatest Grafana Cloud updates.

How to Configure Docker's Shared Memory Size (/dev/shm)

Your Node.js app runs fine on your machine. But inside Docker? You start getting weird crashes—ENOSPC: no space left on device. Chrome headless tests fail out of nowhere. PostgreSQL throws shared memory errors under load. The problem? It’s probably /dev/shm, the shared memory volume Docker sets up by default. Most containers get just 64MB of space here.

Amazon SQS Metrics: Monitor, Debug, and Optimize Your Message Queues

Message queues quietly take care of a lot—buffering workloads, smoothing traffic spikes, and keeping services connected. But they don’t always get much attention until something feels off. Amazon SQS offers a solid set of metrics to help you understand how your queues are doing, whether you’re scaling well or nearing limits. This blog breaks down the key SQS metrics: where to find them, what they mean, and how to respond when things start to shift.

Introducing Cause Analysis: Instant Triage for Traffic Changes with Kentik AI

Introducing Cause Analysis from Kentik, designed to simplify network traffic analysis and rapidly identify the root cause of issues. Learn how this exciting new feature streamlines troubleshooting, makes complex insights accessible, and boosts team efficiency for all users.

Understanding APM and Distributed Tracing in the Observability Stack

To keep modern applications running smoothly, you need more than just basic monitoring. APM (Application Performance Monitoring) gives you a broad overview, tracking metrics like latency, errors, and system health. Distributed Tracing, on the other hand, shows the full journey of each request across services, helping you pinpoint the root cause of slowdowns or failures.

How to Reduce IT Costs on Hardware Refresh Cycles

IT budgets are under pressure, and hardware refresh costs continue to climb. For End User Computing (EUC) and IT professionals, the traditional time-based approach to managing device lifecycles is no longer viable. Simply replacing laptops and desktops every three to five years doesn’t reflect actual device performance, usage patterns, or business needs. The solution? A smarter, data-driven hardware refresh strategy that balances performance, cost-efficiency, and employee experience.