Operations | Monitoring | ITSM | DevOps | Cloud

%term

This Month in Datadog - October 2024

On the October episode of This Month in Datadog, Jeremy Garcia (VP of Technical Community and Open Source) covers unified Error Tracking, Security Operational Metrics, and a new Datadog Serverless feature for retrying or redriving failed AWS Step Functions executions directly from Datadog. Later in the episode, Shri Subramanian (Group Product Manager) spotlights Datadog LLM Observability’s native integration with Google Gemini. Also featured are our blog posts Operator vs.

Application Performance Monitoring (APM) Guide for DevOps Teams in 2024

In today's rapidly evolving technology landscape, Application Performance Monitoring (APM) has become a critical component for DevOps teams striving to maintain high-performing, reliable applications. This comprehensive guide explores everything modern DevOps teams need to know about implementing and optimizing their APM strategy.

What is a Network Error? Understanding and Fixing the 12 Most Common Network Errors

We’ve all experienced those frustrating moments when a network error code pops up unexpectedly, and you're forced to stop everything you're doing. We all hate to see a 404 (Not Found) or 500 (Internal Server Error) network error coming. Whether it’s sluggish connections, dropped calls, or websites refusing to load, the instinct is often to try quick fixes, browse a few “how-to” articles, or even just wait for the issue to pass.

Maximize Azure Stack HCI Performance: Proven Resource Optimization Techniques

Looking to optimize your Azure Stack HCI and boost the efficiency of your on-prem infrastructure? Watch this exclusive on-demand webinar to learn actionable strategies for improving performance and reducing costs, tailored specifically for IT professionals managing Azure Stack HCI environments.

Building Resilience from Architecture to Production with AWS & Gremlin

Unreliable software can have a painful impact on your customers and your business—something we’ve all seen and felt during high-profile outages. And while building on the cloud with AWS unlocks improved scaling and reliability capabilities, the complexity of modern distributed systems can potentially introduce outage-causing reliability risks. How can you be sure your systems are resilient to failure when they’re based on complex architecture, built by hundreds of teams, and are being updated almost constantly?

EBS Vs. EFS: Which AWS Data Storage Solution Is Best For You

AWS currently offers eight main data storage services that cover different needs, from object storage to file and block storage. The most popular is Amazon S3, an object storage service. Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS) follow. Both offer persistent, secure, and scalable storage. Yet, each has a distinct architecture and use cases. In this guide, we compare EBS vs EFS to help you decide which is better for your AWS data storage needs.

Resolving Application Issues Faster with Stackify Retrace

In an agile DevOps environment, developers move quickly and often, making small changes in ongoing sprints. Once applications go live, operations teams (and often times, developers themselves) take over performance management and issue resolution, while updates and improvements continue. Developers and DevOps teams need a continuous flow of information on how each iteration works, fails, or worse – introduces new problems.

Netdata's Native Windows Agent: The Best Way to Monitor Windows!

We are pleased to announce a significant advancement in system monitoring: the launch of Netdata’s first-ever native Windows agent. This release represents a major step forward in our mission to provide comprehensive and efficient monitoring solutions across all platforms. With the introduction of the native Windows agent, we are extending our robust monitoring capabilities to Windows environments, enabling seamless and unified monitoring across diverse infrastructures.