Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Observability Pricing Models: How to Evaluate Cost, Value, and Predictability

Observability pricing often seems reasonable at the outset, but many organizations discover their real complexity only as environments scale and usage patterns change. As environments grow more complex and hybrid by default, many organizations struggle with rising costs, fragmented tools, and pricing models that complicate cost predictability and long-term planning.

Agentless First, Agents When Needed: A Hybrid Approach to Security Telemetry

Security data collection has become a first-class architectural concern for modern SOCs. Once collection is treated as a dedicated layer, separate from analytics and detection, the next question becomes practical: how should telemetry be collected in a way that aligns with this architecture? In the previous article, we examined why this shift occurred. Here, we focus on how different collection models (agent-based, agentless, and hybrid) fit into modern security data collection architectures.

"You Had One Job": Why Twenty Years of DevOps Has Failed to Do it

Let’s start with a question. What is DevOps all about? I’ll tell you my answer. In retrospect, I think the entire DevOps movement was a mighty, twenty year battle to achieve one thing: a single feedback loop connecting devs with prod. On those grounds, it failed. Not because software engineers weren’t good at their jobs, or didn’t care enough. It failed because the technology wasn’t good enough.

What is Runtime Context? A Practical Definition for the AI Era

TLDR: Runtime Context is live, execution-level access to a running production system. It lets engineers and AI agents ask precise questions of running code and get answers immediately, without redeploying or interrupting users. This is the new baseline for reliability.

Fleet Management and Terraform: Use cases and best practices for managing collectors in Grafana Cloud

Earlier this year we launched Grafana Cloud Fleet Management to address the pain that comes with managing scores of telemetry collectors across departments and environments. We've been excited to see how organizations are using it to manage collectors at scale, but we've also heard from users who aren't sure how Fleet Management fits with their existing infrastructure-as-code tooling. The good news is Fleet Management is designed specifically to complement—not replace—tools like Terraform.

Paginating large datasets in production: Why OFFSET fails and cursors win

The things that separate an MVP from a production-ready app are polish, final touches, and the Pareto ‘last 20%’ of work. Many of the bugs, edge cases, and performance issues will come to the surface after you launch, when the user stampede puts a serious strain on your application. If you’re reading this, you’re probably sitting on the 80% mark, ready to tackle the rest.