Operations | Monitoring | ITSM | DevOps | Cloud

Observability

The latest News and Information on Observabilty for complex systems and related technologies.

TransUnion's Steve Koelpin shares his solution to automate log onboarding

Please join us to hear how Steve led a team effort to lower the time it takes to onboard new logs into his data analytics platform. Steve optimized a process that previously took hours and reduced it to minutes to increase developer productivity and enable the logging and analytics team to focus more on delivering business value to Transunion.

Five Blind Spots Solved Through Observability

“Too many cooks spoil the broth.” It’s an old saying we’ve heard many times in childhood. If we put it in today’s IT monitoring context, we could change it to “too many tools spoil the insights and efficiency.” IT teams across organizations have deployed multiple tools over the decades to monitor and track the performance of networks, databases, and applications and to ensure the smooth running of the business.

Authors' Cut-Debugging with the Core Analysis Loop, and What to Build vs Buy

In the old days, the most senior members of an engineering team were the best debuggers. They had built up such an extensive knowledge about their systems that they instinctively knew the right questions to ask and the right places to look. They even wrote detailed runbooks in an attempt to identify and solve every possible issue and possible permutation of an issue.

Unified Observability is the Solution IT Has Been Waiting For

IT teams have been relying on observability tools to (theoretically) provide intelligence and insights into operating conditions within an organization’s digital infrastructure for years. But most of these tools have come with significant shortcomings that leave IT teams wanting more.

Grafana Labs founders on the future of observability and how to scale an open source company

“Overwhelming.” It was the only word Grafana Labs CEO and Co-founder Raj Dutt could use to describe how it felt to look out at the sea of more than 600 Grafanistas gathered together in Whistler, British Columbia, for the first company-wide employee event in two years.

The Next Frontier for Observability: Data Ownership with OpenTelemetry

Observability is a mindset that lets you use data to answer questions about business processes. In short, collecting as much data as possible from the components of your business — including applications and key business metrics — then using an AI-powered tool to help consolidate and make sense of this huge volume of data gives you observability into your business. Having observability for your business and applications lets you make smarter decisions, faster.

A Data Lake Is Not Enough to Keep Your Observability Ambitions Afloat

Recently I heard one of our prospects talk about a competitor who was promoting their data lake and ask, how are we different than that? His question got me thinking about why a data lake alone does not provide the depth of observability you really need. The goal of observability is to help SREs, IT Ops and DevOps teams run their IT systems with close-to-zero downtime. Consolidating data from across your environment into a data lake is certainly a good step.

Datasets, Traces, and Spans-Oh My!

If you've stumbled (or purposefully landed) on this blog post, chances are you are new to—or diving deeper—into the observability space, o11y for short. Suffice it to say, you’re not in Kansas anymore. Honeycomb in a lot of ways can serve as a yellow brick road into o11y, and this article should serve as an introduction into how Honeycomb facilitates implementing o11y into applications and distributed services.

Top 12 Site Reliability Engineering (SRE) Tools

Ben Treynor Sloss, then VP of Engineering at Google, coined the term “Site Reliability Engineering” in 2003. Site Reliability Engineering, or SRE, aims to build and run scalable and highly available systems. The philosophy behind Site Reliability Engineering is that developers should treat errors as opportunities to learn and improve. SRE teams constantly experiment and try new things to enhance their support systems.