Operations | Monitoring | ITSM | DevOps | Cloud

%term

Which is Better for Monitoring: Datadog or AWS CloudWatch?

Observability is the process of understanding complex systems by analyzing their outcomes and enhancing those outcomes by monitoring events within the system. Today, observability is essential for IT services to achieve a better user experience and optimize software performance. With cloud platforms dominating the IT services landscape, organizations are inclined to deploy their software and hardware systems in the cloud to reduce operational costs and enhance flexibility.

How the Prometheus community is investing in OpenTelemetry

Goutham Veeramachaneni, a product manager at Grafana Labs, and Carrie Edwards, a senior software engineer at Grafana Labs, are both contributors to the Prometheus open source project. This post, which they wrote together, was originally published on the Prometheus.io blog in March 2024. The OpenTelemetry project is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.

Beyond Microservices: Miniservices, Macroservices, and the in between

Containerized microservices have been the gold standard for cloud computing since they replaced the monolith architecture over a decade ago. The flexibility, scalability, and velocity they enable for teams make them an obvious choice. Yet, a strict interpretation of one service for one function doesn’t quite serve everyone, especially when architectures get large. We’ll discuss how flexibility in service architecture might be the way to go.

The Data Lake Dilemma: Why Businesses Need a New Approach

In today’s data-driven landscape, every organization knows the immense value their data holds, but with the explosion of data from diverse sources, traditional data storage and management solutions are proving inadequate. Organizations are urgently seeking new ways to handle their data effectively.

Datadog on Site Reliability Engineering #shorts #datadog #observability

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.

Creating alerts with Grafana | Grafana for Beginners Ep 11

When observing your data with Grafana, you don't need to be glued to your dashboard 24/7. Join Senior Developer Advocate, Lisa Jung to learn how to set up Grafana to keep an eye on your data and alert you if something needs your attention! The following are covered in this episode: ☁️ Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more. We also have plans for every use case.

Using Telegraf to Feed API JSON Data into Kentik NMS

Discover how to harness API JSON data for Kentik NMS using Telegraf in this insightful video by Justin Ryburn. Learn to containerize with Docker Compose, configure Telegraf to collect and transform metrics, and seamlessly integrate them into Kentik's network monitoring system. A step-by-step guide for enhancing your NMS capabilities with API data.

Calico VPP: Empowering High-Performance Kubernetes Networking with Userspace Packet Processing

This is a guest post authored by Nathan Skrzypczak, R&D Engineer at Cisco. Calico VPP, the latest addition to Calico’s suite of pluggable data planes, revolutionizes Kubernetes networking by enabling transparent user-space packet processing. With features such as service load balancing, encapsulation, policy enforcement, and encryption, Calico VPP brings the performance, flexibility, and observability of VPP to Kubernetes networking.