Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

A Taste of Observability - Embrace the Cloud With OpenTelemetry

Join Splunk Observability expert Kirk O'Quinn and Monster CICD Lead Graham Bucknell for a conversation on OpenTelemetry (OTel), a powerful open-source project that is transforming how we monitor and trace applications. In this informative session, we will delve into the world of Otel, exploring its history, its roadmap and we will discuss lessons, and success/failures of “Companies” journey to OpenTelemetry.

Tracing the Line: Understanding Logs vs. Traces

In the software space, we spend a lot of time defining the terminology that describes our roles, implementations, and ways of working. These terms help us share fundamental concepts that improve our software and let us better manage our software solutions. To optimize your software solutions and help you implement system observability, this blog post will share the key differences between logs vs traces.

Top 12 SolarWinds Competitors and Alternatives In 2024

Organizations exploring SolarWinds alternatives often face a critical decision when choosing the right network and infrastructure monitoring solution. While SolarWinds has established itself as a reliable industry standard, companies are increasingly seeking alternatives that offer better alignment with their monitoring needs, budget constraints, and security requirements.

AI Observability with Grafana with Ishan Jain (Grafana Office Hours #29)

In this Grafana Office Hours, Ishan Jain talks about AI Observability with Grafana: what it entails, factors to consider when monitoring and observing LLMs, and how to do it all with Grafana. He is joined by Senior Developer Advocate Nicole van der Hoeven. LINKS.

LLM Monitoring and Observability

The demand for LLM is rapidly increasing—it’s estimated that there will be 750 million apps using LLMs by 2025. As a result, the need for LLM observability and monitoring tools is also rising. In this blog, we’ll dive into what LLM monitoring and observability are, why they’re both crucial and how we can track various metrics to ensure our model isn’t just working but thriving.

SolarWinds Observability Self Hosted 2024.4 Expanded Device Support and Enhanced Wireless Monitoring

Discover the latest features in SolarWinds version 2024.4! This update brings support for a variety of new network devices, including Fortinet SD WAN, Ruckus, Juniper, Arista, and Extreme Networks wireless access points, plus Meraki switch support via API integration. Join Crystal Taylor, SolarWinds Evangelist, as she takes you through the new wireless monitoring capabilities and shows how your network management just got easier. Watch now to optimize your network oversight and stay ahead with these powerful enhancements!

SolarWinds Observability Self Hosted 2024.4: New Cloud Monitoring for Azure and AWS Databases!

Explore the powerful new features in SolarWinds version 2024.4, now supporting expanded cloud monitoring capabilities! Crystal Taylor, SolarWinds Evangelist, walks you through the latest updates, including Azure Managed Instance, Azure MySQL, Azure PostgreSQL, and Amazon RDS for SQL Server. See firsthand how PostgreSQL and RDS instances are monitored, showcasing detailed charts and metrics like Log IOs, physical data reads, and memory usage. Upgrade now to take full advantage of these new insights and optimize your cloud database performance.

Against Incident Severities and in Favor of Incident Types

About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually using a SEV scale), but decided to adopt an approach based on types, aiming to better play the role of quick definitions for multiple departments put together. This post is a short report on our experience doing it.

Observability as a superpower

With every job I have, I come across a new observability tool that I can’t live without. It’s also something that’s a superpower for us at incident.io: we often detect bugs faster than our customers can report them to us. A couple of jobs ago, that was Prometheus. In my previous job, it was the fact that we retained all of our logs for 30 days, and had them available to search using the Elastic stack (back then, the ELK stack: Elasticsearch, Logstash, and Kibana).

Network Observability: Mastering Infrastructure Data for Smarter IT

If you want to know exactly what’s on your network and how it’s all connected in real time, then network observability is the answer. Network observability pulls data from sources across your network infrastructure to model a detailed view of your systems and how they interact. This lets you understand exactly what’s happening on your network at any given moment so you can optimize performance.