Operations | Monitoring | ITSM | DevOps | Cloud

APM

The latest News and Information on Application Performance Monitoring and related technologies.

Provisioning and Autoscaling

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. This infrastructure is used by a wide variety of engineering teams at Datadog, with different feature and capacity needs.

JS Toolbox 2024: Essential picks for modern developers (Overview)

Staying ahead of the curve in JavaScript development requires keeping on top of the ever-evolving landscape of tools and technologies. As we head into 2024, the sprawling world of JavaScript development tools will continue to transform, offering more refined, efficient, and user-friendly options. This ‘JS Toolbox 2024’ series is your one-stop for a comprehensive overview of the latest and most impactful tools in the JavaScript ecosystem.

Observability vs. APM: What to Know on Your Monitoring Journey

In the ever-evolving landscape of software development and IT operations, monitoring tools play a pivotal role in ensuring the performance, reliability, and availability of your applications. Two key disciplines in this domain are observability and Application Performance Management (APM). This post will help you understand the nuances between observability and APM, exploring their unique characteristics, similarities, benefits and differences.

101 Guide to RabbitMQ Metrics Monitoring

This guide covers key metrics important for efficiently monitoring RabbitMQ. We will also talk about in-built RabbitMQ monitoring tools with which you can start monitoring your RabbitMQ instances. In fast-paced, data-driven applications where our data flows between the systems at lightning speed - the reliability and efficiency of your messaging infrastructure can make or break your whole application.
Sponsored Post

Improving API error responses with the Result pattern

In the expanding world of APIs, meaningful error responses can be just as important as well-structured success responses. In this post, I'll take you through some of the different options for creating responses that I've encountered during my time working at Raygun. We'll go over the pros and cons of some common options, and end with what I consider to be one of the best choices when it comes to API design, the Result Pattern. This pattern can lead to an API that will cleanly handle error states and easily allow for consistent future endpoint development.

Log Monitoring 101 Detailed Guide [Included 10 Tips]

Log monitoring is the practice of tracking and analyzing logs generated by software applications, systems, and infrastructure components. These logs are records of events, actions, and errors that occur within a system. Log monitoring helps ensure the health, performance, and security of applications and infrastructure. Log Monitoring helps in early detection of potential issues, ensuring systems run smoothly and efficiently. In this detailed 101 guide on Log monitoring, we will learn.

OpenTelemetry in 2023 - What we learnt from the community and our users

OpenTelemetry has brought a sea change in the world of observability. The idea of the project was to standardize the instrumentation needed for generating telemetry. Teams shouldn’t need to change how they collect data if they want to try a new visualization/backend for the telemetry data. That was the vision. This idea seems to have resonated with the developer and devops communities.

Paving the Road for Proactive Reliability

At Expedia Group, Kaushik Patel and Nikos Katirtzis have thousands of engineers and micro-services. Heterogeneity in terms of infrastructure and technologies used over the years created inefficiencies and posed the need for a set of automated best practices for our engineering teams. Over the past 2 years, using a data-driven approach, we’ve worked on creating a set of platforms that helps teams to adopt good reliability practices, including chaos engineering, release safety, or automatic failover between cloud regions. In this talk Kaushik and Nikos will cover the platforms they’ve built, including how they used data to drive their investment decisions.

The Importance of Traces for Modern APM [Part 2]

In part 1, we looked at how the design plan of traditional monitoring technologies depended heavily on properties of the systems that were intended to monitor and then showed how those properties began to be undermined by an increase in complexity, an increase which can ultimately be captured by the concept of entropy. In this part, we will explore how increased entropy forces us to rethink what is required for monitoring.