Operations | Monitoring | ITSM | DevOps | Cloud

%term

We can do better failure detection in serverless applications

Traditionally in white-box monitoring, error reporting has been achieved with third party libraries, that catch and communicate failures to external services and notify developers whenever a problem occurrs. I’m here to argue that for managed services this can be achieved with less effort, no agents and without performance overhead.

Introducing the Datadog Cluster Agent

As containers and orchestrators have surged in popularity, they have created highly dynamic environments with rapidly changing workloads—and the need for equally dynamic ways of monitoring them. After all, orchestration technologies like Kubernetes, DC/OS, and Swarm manage container workloads both at the node level and at the cluster level, which means that you need to gather insights from every layer to fully understand the state of your infrastructure.

4 Reasons Why Your Source Maps are Broken

Source maps are awesome. Namely, because they are used to display your original JavaScript while debugging, which is a lot easier to look at than minified production code. In a sense, source maps are the decoder ring to your secret (minified) code. However, they can be tricky to get working properly. If you’ve run into some trouble, the tips below will hopefully help you get everything in working order.

Track the status of your SLOs with the new monitor uptime widget

Service level objectives are an important tool for maintaining application performance, ensuring a consistent customer experience, and setting expectations about service performance for both internal and external users. We are very pleased to announce the availability of a new monitor uptime widget that makes it simple to monitor the status of your SLOs and communicate that status to your teams, executives, or external customers.

Log Patterns: Automatically cluster your logs for faster investigation

Sifting through all your logs to find what you need can be challenging—especially during an outage, when time is critical and you’re flooded with WARN and ERROR messages. To help you immediately surface useful information from large volumes of logs, we developed Log Patterns.

Introducing Stackdriver as a data source for Grafana

It is not uncommon to have multiple monitoring solutions for IT infrastructure these days as distributed architectures take hold for many enterprises. We often hear from Google Cloud Platform (GCP) customers that they use Stackdriver to monitor resources as well as Grafana and Prometheus for container monitoring. We’ve heard lots of requests from customers to be able to view Stackdriver data in Grafana effortlessly.

Your Journey to the Cloud - 5 Essential Facts About Zenoss + Nutanix

Growing up in Seattle, Washington, I had access to some of the best hiking in the world. If you want take on a challenge and climb a peak, then the Cascade and Olympic mountain ranges provide a variety of journeys for everyone regardless of skill level and starting point. We were taught at a very young age that everyone needed certain “essentials” before setting out to ensure safety and success.

Challenges and Solutions for Scaling Kubernetes in the Hybrid Cloud

When traffic increases, we need to have a way to scale our application to keep up with user demand. With Kubernetes multi-cluster management through Rancher, scaling has never been easier and more efficient. Read here about scaling Kubernetes and the challenges you might be facing when managing a hybrid cloud environment.