Latest News

Flaky tests: their hidden costs and how to address flaky behavior

Oct 23, 2024 By Bowen Chen In Datadog

Flaky tests are bad—this is a fact implicitly understood by developers, platform and DevOps engineers, and SREs alike. When tests flake (i.e., generate conflicting results across test runs, without any changes to the code or test), they can arbitrarily fail builds, requiring developers to re-run the test or the full pipeline. This process can take hours—especially for large or monolithic repositories—and slow down the software delivery cycle.

Read Post

Datadog

Read more about Flaky tests: their hidden costs and how to address flaky behavior

Beyond Their Intended Scope: Uzing into Russia

Oct 23, 2024 By Doug Madory In Kentik

The first installment of our new blog series, Beyond Their Intended Scope, covers BGP mishaps that may have escaped the community’s attention but are worthy of analysis. In this post, we review a recent BGP leak that redirected internet traffic through Russia and Central Asia as a result of a path error leak by Uztelecom, the incumbent service provider of Uzbekistan.

Read Post

Kentik

Read more about Beyond Their Intended Scope: Uzing into Russia

Key Metrics to Monitor for a Healthy Kafka Cluster

Oct 23, 2024 By Navdeep Sidhu In meshIQ

Maintaining a healthy Kafka cluster is critical to ensuring your real-time data pipelines run smoothly. However, keeping your Kafka environment in tip-top shape isn’t just about setting it up and letting it run. Regular monitoring of key metrics is essential to catch issues before they escalate, optimize performance, and keep everything humming along smoothly. So, what should we be looking at when it comes to Kafka metrics? Let’s break down the most important ones and how to interpret them.

Read Post

meshIQ

Read more about Key Metrics to Monitor for a Healthy Kafka Cluster

Understanding Kubernetes Metrics Server: Your Go-to Guide

Oct 23, 2024 By Anjali Udasi In Last9

Learn how the Kubernetes Metrics Server helps monitor resource usage like CPU and memory, ensuring smooth cluster performance and scalability.

Read Post

Last9

Read more about Understanding Kubernetes Metrics Server: Your Go-to Guide

AWS X-Ray vs Jaeger - Choosing the Right Distributed Tracing Tool

Oct 23, 2024 By Pavithra Parthiban In Atatus

Distributed tracing has become an essential part of any application's performance monitoring strategy. As businesses adopt distributed architectures, choosing the right tracing tool is crucial for efficient troubleshooting and performance monitoring. The two most prominent choices are AWS X-Ray and Jaeger, each offering unique features and advantages. AWS X-Ray, a managed service by Amazon, simplifies tracing for applications running on AWS.

Read Post

Atatus

Read more about AWS X-Ray vs Jaeger - Choosing the Right Distributed Tracing Tool

Infrastructure Monitoring Checklist: What you should monitor

Oct 23, 2024 By Johannes Rauh In Icinga

You want to monitor your infrastructure? Monitoring is essential to ensure system stability, security and optimal performance. Without proper monitoring, small issues can quickly escalate into major problems and affect productivity and service availability. While there is no fixed checklist for infrastructure monitoring and it depends on your setup, there are some key areas that are worth considering when building your own monitoring strategy that fits the needs of your own environment.

Read Post

Icinga

Read more about Infrastructure Monitoring Checklist: What you should monitor

Determining a CoPE's Efficacy-and Everything After

Oct 23, 2024 By Nick Travaglini In Honeycomb

As discussed in the first article in this series, a Center of Production Excellence (CoPE) is a more or less formal, provisional subsystem within an organization. Its purpose is to act from within to change that organization so that it’s more capable of achieving production excellence. The series has, to date, focused mainly on how best to construct such a subsystem and what activities it should pursue.

Read Post

Honeycomb

Read more about Determining a CoPE's Efficacy-and Everything After

12 Benefits You Get by Scaling with Netdata

Oct 23, 2024 By Netdata Team In netdata

80% of decision-makers globally acknowledge that digital infrastructure is essential for reaching business goals. However, IT infrastructure is becoming increasingly distributed and complex. Organizations are managing hundreds—even thousands—of nodes across cloud, on-premise, and edge environments. This predicament makes effective monitoring across all systems more essential than ever.

Read Post

netdata

Read more about 12 Benefits You Get by Scaling with Netdata

The Ultimate List of Incident Management Tools in 2024

Oct 23, 2024 By Hrishikesh Barua In IncidentHub

Incident management tools are important for organizations to effectively handle service outages. With so many incident management tools around with different feature sets, it's often difficult to find the one that is right for your needs. In this article, we attempt to make a list of incident management software available in 2024 with their features to help you arrive at the right one.

Read Post

IncidentHub

Read more about The Ultimate List of Incident Management Tools in 2024

RabbitMQ vs Kafka: Which Is Right for You?

Oct 23, 2024 By Stackify Team In Stackify

For distributed systems and microservices, message brokers play a very important role. Message brokers keep data flowing smoothly between different parts of our applications. Two names that often come up in discussions about message brokers are RabbitMQ and Kafka. But what exactly are they, and how do they differ?

Read Post

Stackify

Read more about RabbitMQ vs Kafka: Which Is Right for You?

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Flaky tests: their hidden costs and how to address flaky behavior

Beyond Their Intended Scope: Uzing into Russia

Key Metrics to Monitor for a Healthy Kafka Cluster

Understanding Kubernetes Metrics Server: Your Go-to Guide

AWS X-Ray vs Jaeger - Choosing the Right Distributed Tracing Tool

Infrastructure Monitoring Checklist: What you should monitor

Determining a CoPE's Efficacy-and Everything After

12 Benefits You Get by Scaling with Netdata

The Ultimate List of Incident Management Tools in 2024

RabbitMQ vs Kafka: Which Is Right for You?

Monthly Archive

Follow Us