Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Should Every Incident Get a Retro?

At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague Ben Hartshorne asked a fascinating question, which I’ll paraphrase here: That caught me by surprise. We had a great discussion, and it made me consider approaches I hadn’t before.

Cost-Cutting Strategies and Smart Tooling Choices to Maximize Your Vendor Budget

Tech debt. Vendor redundancy. System fragmentation. Startups and cloud–born companies are looking at vendors for cost-cutting opportunities. But how do you balance vendor costs and value when those resources and tools bring efficiencies as high as the monthly bills? In this session, Charity Majors and Gergely Orosz share advice on managing spend in a vendor-dependent world.

Monitoring service performance: An overview of SLA calculation for Elastic Observability

Elastic Stack provides many valuable insights for different users. Developers are interested in low-level metrics and debugging information. SREs are interested in seeing everything at once and identifying where the root cause is. Managers want reports that tell them how good service performance is and if the service level agreement (SLA) is met. In this post, we’ll focus on the service perspective and provide an overview of calculating an SLA.

Lightrun Launches New .NET Production Troubleshooting Solution: Revolutionizing Runtime Debugging

Lightrun, the leading Developer Observability Platform for production environments, announced today that it has extended its support to include C# on its plugins for JetBrains Rider, VSCode, and VSCode.dev. With this new runtime support, .NET developers can troubleshoot their apps against.NET Framework 4.6.1+, .NET Core 2.0+, and.NET 5.0+ technologies.

How the All-In Comprehensive Design Fits into the Cribl Stream Reference Architecture

Join Cribl's Ed Bailey and Ahmed Kira as they provide more detail about the Cribl Stream Reference Architecture, which is designed to help observability admins achieve faster and more valuable stream deployment. During this live stream discussion, Ed and Ahmed will explain the guidelines for deploying the comprehensive reference architecture to meet the needs of large customers with diverse, high-volume data flows. They will also share different use cases and discuss the pros and cons of using the comprehensive reference architecture.

The Sun's Setting on Cisco Prime Infrastructure, Rising on SolarWinds Hybrid Cloud Observability

Cisco recently announced its plan to End of Life (EOL) Cisco Prime Infrastructure. While they’re offering an alternative solution with this announcement, Cisco DNA Center, support for multi-vendor environments appears to be decreasing.

Alerting on the User Experience

When your alerts cover systems owned by different teams, who should be on call? We get this question a lot when talking about SLOs. We believe that great SLOs measure things that are close to the user experience. However, it becomes difficult to set up alerting on that SLO, because in any sufficiently complex system, the SLO is going to measure the interaction between multiple services owned by different teams.

Honeycomb's Deployment Protection Rule for GitHub Actions

Today, GitHub announced the public beta of Deployment Protection Rules for GitHub Actions for GitHub Enterprise users. In support of that launch, we’ve partnered with GitHub to create the Honeycomb Deployment Protection Rule (available as a GitHub App). This rule lets you run Honeycomb queries so that you can get real-time performance feedback from your services before deciding whether to prevent deployment of your code to a specific environment.

Observability overload: Insights into the rise of tools, data sources, and environments in use today

With countless observability tools, data sources, and environments to juggle, the organizations that deploy and manage today’s distributed applications often face an uphill battle to gain visibility into their application performance. That was a key takeaway from the Grafana Labs Observability Survey 2023, which incorporated input from more than 250 industry practitioners who are all too familiar with these complexities.