Operations | Monitoring | ITSM | DevOps | Cloud

Trace Distributed Map states for AWS Step Functions with Datadog

AWS Step Functions offers the Distributed Map state, enabling you to coordinate massively parallel workloads within your serverless applications. With this feature, a single Step Functions execution can fan out into up to 10,000 parallel workflows simultaneously, making it possible to efficiently process millions of items in parallel. This capability unlocks new possibilities for large-scale data processing, such as image transformation, log ingestion, or batch analytics.

Grafana Cloud updates: The latest features in Kubernetes Monitoring, Fleet Management, and more

We consistently roll out helpful updates and fun features in Grafana Cloud, our fully managed observability platform powered by the open source Grafana LGTM Stack ( Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics). In case you missed them, here’s our monthly round-up of the latest and greatest Grafana Cloud updates.

SwiftPM, CocoaPods, and the Future of Enterprise Development for Apple Platforms

Swift is the default and preferred language for developing applications within the Apple ecosystem. The Swift Package Manager (SwiftPM) has become the de-facto dependency manager for Swift, enabling developers to share and reuse code effortlessly. While its elegance lies in its simplicity, there’s a common concern about integrating SwiftPM into robust, enterprise-grade development workflows. This is where JFrog Artifactory shines.

Enterprise Drupal: Why hosting all your apps on one platform matters

For many enterprises, Drupal has been the backbone of their web operations for years. It’s a battle-tested CMS that handles complex content needs with elegance. But business needs have evolved. Today, it’s rare for a company to rely only on Drupal. They are spinning up Python APIs, .NET backend services, Node.js apps, Java microservices — expanding their digital ecosystems around Drupal’s core.

Navigating Shopware logs and slow pages in a real world scenario

A Shopware store goes from smooth to sluggish—pages take 10 seconds to load, even longer in some cases. What happened? In this post, we tell the true story of how one overlooked plugin setting nearly collapsed a storefront, and how it was resolved using native tools. If you’re shipping code in Shopware without clear performance observability, this is your wake-up call. Everything was working, until it wasn’t.

How to Configure Docker's Shared Memory Size (/dev/shm)

Your Node.js app runs fine on your machine. But inside Docker? You start getting weird crashes—ENOSPC: no space left on device. Chrome headless tests fail out of nowhere. PostgreSQL throws shared memory errors under load. The problem? It’s probably /dev/shm, the shared memory volume Docker sets up by default. Most containers get just 64MB of space here.

Amazon SQS Metrics: Monitor, Debug, and Optimize Your Message Queues

Message queues quietly take care of a lot—buffering workloads, smoothing traffic spikes, and keeping services connected. But they don’t always get much attention until something feels off. Amazon SQS offers a solid set of metrics to help you understand how your queues are doing, whether you’re scaling well or nearing limits. This blog breaks down the key SQS metrics: where to find them, what they mean, and how to respond when things start to shift.

Introducing Cause Analysis: Instant Triage for Traffic Changes with Kentik AI

Introducing Cause Analysis from Kentik, designed to simplify network traffic analysis and rapidly identify the root cause of issues. Learn how this exciting new feature streamlines troubleshooting, makes complex insights accessible, and boosts team efficiency for all users.

Understanding APM and Distributed Tracing in the Observability Stack

To keep modern applications running smoothly, you need more than just basic monitoring. APM (Application Performance Monitoring) gives you a broad overview, tracking metrics like latency, errors, and system health. Distributed Tracing, on the other hand, shows the full journey of each request across services, helping you pinpoint the root cause of slowdowns or failures.

How to Reduce IT Costs on Hardware Refresh Cycles

IT budgets are under pressure, and hardware refresh costs continue to climb. For End User Computing (EUC) and IT professionals, the traditional time-based approach to managing device lifecycles is no longer viable. Simply replacing laptops and desktops every three to five years doesn’t reflect actual device performance, usage patterns, or business needs. The solution? A smarter, data-driven hardware refresh strategy that balances performance, cost-efficiency, and employee experience.