Operations | Monitoring | ITSM | DevOps | Cloud

Observability

The latest News and Information on Observabilty for complex systems and related technologies.

ICYMI: Honeycomb Developer Week: The Partner Ecosystem

We know that you value collaboration. That’s why we share incident reviews and learnings—because we believe the entire community benefits by working together transparently. In the spirit of working better together, we invited ecosystem partners from ApolloGraph, Cloudflare, LaunchDarkly, and PagerDuty to present at Honeycomb Developer Week, a three-day event filled with snackable, time-efficient learning sessions to help you uplevel your observability skills.

Operations Analytics - The Next Big Thing

Cloud Data Warehouses (CDW) were designed to support business intelligence use cases focused on historical data analysis, but less so on “what is happening now?” class of queries. We think operational analytics are the next big focus and we want to discuss the space and how enterprises will connect their operational data to these new tools to get results right now instead of next week.

The Importance of Network Insights in Achieving Full End-To-End Observability

When we talk about observability, we tend to focus first and foremost on the metrics, logs, and traces that you can collect from applications – such as request rates, error rates, and request duration. Infrastructure-level metrics, like CPU and memory utilization, might factor into the discussion as well. Here’s a third category of critical observability insights that teams tend to overlook: the network.

Why cloud native requires a holistic approach to security and observability

Like any great technology, the interest in and adoption of Kubernetes (an excellent way to orchestrate your workloads, by the way) took off as cloud native and containerization grew in popularity. With that came a lot of confusion. Everyone was using Kubernetes to move their workloads, but as they went through their journey to deployment, they weren’t thinking about security until they got to production.

The Importance of Observability for the SRE

The term Site Reliability Engineer (SRE) first appeared in Google in the early 2000s. In Google’s 2016 SRE Book, Benjamin Treynor Sloss wrote that, generally speaking, “an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).” This means that the SRE teams at Google decide how a system should run in production as well as how to make it run that way.

Observability - Software & Tools.

A developer's viewpoint is distinct. It can be difficult to keep track of operations and detect the fault that is causing the software to malfunction when handling numerous sectors. What if you could detect the issue ahead of time and fix it as soon as possible? The tactics that we concentrate on and put into action are those that assist us in properly managing our tasks. Knowing about observability makes this possible. Let's take a closer look at it in this blog.