If you’re an experienced engineer, you likely have comprehensive observability and monitoring set up for your production systems. So if issues arise, you’re empowered to resolve them quickly. Yet, there are way too many systems out there, especially smaller and simpler ones, which are running with only rudimentary observability systems, or no observability at all. This means when an application goes down or starts to perform poorly, it may be very hard to pinpoint and resolve the issue.
You are running a complex, mission-critical application, and you understand you need an advanced Observability solution to efficiently troubleshoot and proactively prevent issues. Yet you have a choice to make—should you choose a “Fully Managed” SaaS solution such as Datadog, Newrelic, or Dynatrace, or should you pick an Open-Source solution that you can host yourself?
I’ve heard this story many times from production engineers: ‘We use tools like Datadog and NewRelic, but to keep costs from skyrocketing, we’re only monitoring our most critical services. We’re storing just 10% of our logs and traces and only the metrics we consider essential. It’s a frustrating situation. Engineers want full visibility across their systems, but cloud storage costs make it impossible to monitor everything.
eBPF is a powerful technology used by many observability solutions, including Coroot. While web-based observability tools like Coroot are invaluable, there’s a specific class of eBPF tools that often go overlooked (besides Brendan Gregg of course): eBPF Linux Command Line Tools. These tools are essential for diving deep into complex performance issues. But first – why would you need those at all if you have convenient observability focused web applications?
In this blog post we will look at runqlat and runqslower commands. They are available in both BCC and bpftrace tool collections. One of the core functions of Linux operating system is to schedule processes across available CPUs. When service gets a request, Linux typically will need to schedule the process, processing that request to run on one of CPUs. This might be very quick process if idle CPU is available or it can take significant time, if all CPUs are currently busy running different processes.
In this blog post we will look at gethostlatency command. It is available in both BCC and bpftrace tool collections. Most applications and services use hostnames, rather than IP addresses to communicate with other services. This means before connection to the service can be established, another request needs to be made – to DNS (Domain Name System). As such its performance and availability impacts performance of virtually all services in your environment, yet it is often ignored.
In this blog post we will look at filetop command. It is available in BCC tool collection. Disk IO is one of the key activities happening on the system, especially for data intensive systems running databases or serving files.
When discussing the technical foundations of observability, several key components, often referred to as the “pillars,” emerge. While there is no universally agreed-upon number of pillars, this post will focus on four fundamental elements: metrics, logs, traces, and profiles. Due to the vast amount of data generated by metrics, logs, and traces, sampling is often employed to reduce data volume while maintaining representative information.
We’re excited to announce the release of Coroot v1.4! Along with various UI improvements, this update brings a new feature: network traffic monitoring. Now, you can easily see how much data is being transferred between your applications and, more importantly, how much it costs. Let’s dive into the details. In this post, we’ll explore the enhancements and new features included in this release.