Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

Monitor AWS control plane API usage metrics in Datadog

AWS Service Quotas helps you manage limits on the number of resources or API operations that are possible for a given AWS service. Hitting such limits could cause operational disruptions related to getting rate limited on the critical APIs that your applications rely on or being unable to provision additional AWS resources.

Streamline incident management with BigPanda's offering in the Datadog Marketplace

BigPanda is a domain-agnostic AIOps platform that helps organizations detect and resolve incidents in their complex IT environments. By unifying and correlating data from monitoring, change, and topology tools, BigPanda enables teams to quickly pinpoint the root cause of issues and prevent costly outages.

Datadog on Chaos Engineering

As you scale your applications, remaining resilient to underlying network failures, resource constraints introduced by other applications, or spikes in traffic can become exponentially more complex, even with very thorough testing and processes. Chaos engineering is a discipline that encourages experimenting in production and injecting controlled failures into the system to understand how the system will react in such conditions and to improve its reliability.

Google Cloud, Vodafone and Datadog SRE Panel Webinar

Since originating at Google, site reliability engineering (SRE) has enabled countless teams to effectively manage large-scale systems, improve the stability of complex services, and automate operational tasks using software. In this SRE panel, Yuri Grinshteyn (Customer Reliability Engineer, Google) will speak about the core principles of SRE and how the culture is practiced at Google. He will be joined by Llywelyn Griffith-Swain (SRE Manager, Vodafone), who will share Vodafone’s story of adopting SRE, lessons learned, and their best practices for maintaining the cultural shift across teams.

Planning Center: Simplifying observability and reducing MTTR in a serverless world, with Datadog

Justin Bodeutsch, Systems Administrator at Planning Center discusses how Datadog’s alerting, log management, serverless, and infrastructure monitoring tools have simplified internal processes and been instrumental in minimizing MTTR across the business.

Announcing support for Oracle Arm-based Ampere A1 instances

Arm processors have long been at the center of mobile computing, powering billions of smartphones, tablets, smartwatches, and other IoT devices. Today, these processors are beginning to see broader adoption in the cloud as they promise better performance, higher energy efficiency, and lower costs than their x86-based predecessors. Just this week, Oracle announced its new Oracle Cloud Infrastructure Ampere A1 Compute platform, built on the Ampere Altra Arm processor.

Announcing support for Amazon ECS Anywhere

Amazon Elastic Container Service (ECS) is a managed compute platform for containers that was designed to be simple to configure, with opinionated defaults to help users get started quickly. ECS customers can run containerized workloads on either Amazon EC2 instances or the serverless Fargate platform without having to maintain a control plane—and can easily integrate ECS with other AWS resources, like Network Load Balancers, to architect their infrastructure.

Introducing Datadog's Lambda extension

AWS Lambda extensions enable you to seamlessly integrate third-party tooling with your Lambda environment so you can run custom code or monitoring agents alongside your functions. We’ve partnered with AWS to create a Lambda extension that offers a more cost-effective, simplified process for collecting data from your functions.

Use the improved infrastructure list to track your hosts' health

Datadog’s infrastructure list provides a central, high-level view of every host in your environment and pulls together metadata and relevant metrics from across Datadog to help you get the full picture of each one. You can easily filter and sort the list using any host tags, letting you quickly view the status of the parts of your infrastructure you need.