%term

High Cardinality in ClickHouse at Scale: What Actually Breaks

Jun 25, 2026 By Prathamesh Sonpatki In Last9

ClickHouse swallows high-cardinality telemetry at ingest, then breaks at query time weeks later. Here is what fails, and how we keep it fast in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about High Cardinality in ClickHouse at Scale: What Actually Breaks

Chart Your Team's Analytics Journey with Customizable Dashboards in DX NetOps

Jun 25, 2026 By Helen Burke In Broadcom

DX NetOps now features customizable dashboards that give all users some important new features and capabilities. In addition, with the solution’s new integration capabilities, DX NetOps enables users of current analytics and reporting tools to add standardized dashboards over time.

Read Post

Broadcom

Read more about Chart Your Team's Analytics Journey with Customizable Dashboards in DX NetOps

What's New in Network Observability for Summer 2026

Jun 25, 2026 By Sean Armstrong In Broadcom

As a network engineer, you likely face two persistent operational challenges every day: When you have to manually track device lifecycles on spreadsheets or spend your scheduled maintenance periods troubleshooting software upgrades, you lose the time you need to proactively ensure network performance. Over the past six months, we have continued to enhance Network Observability by Broadcom. These latest enhancements directly address the operational challenges outlined above.

Read Post

Broadcom

Read more about What's New in Network Observability for Summer 2026

The debugging crisis nobody's talking about: AI, abstraction, and the skills gap

Jun 25, 2026 By John Dietz In Civo

Here's a scenario that's playing out in engineering teams across the industry right now. A developer uses AI to rapidly prototype a microservice. The code works. They deploy it to production. Six months later, something breaks. The system is under load, a database connection pools, and the service starts failing in subtle ways. The engineer pulls up the code, but here's the problem, they didn't write it. An AI assistant did. They don't understand the flow deeply. They don't know where to look first.

Read Post

Civo

Read more about The debugging crisis nobody's talking about: AI, abstraction, and the skills gap

Cortex catalog data now flows into Rootly

Jun 25, 2026 By Skyler Wuolle In Cortex

Incident response is a context problem. The first minutes of any incident are spent reconstructing what the affected service is, what it depends on, and who owns it. That reconstruction happens during the worst possible window. The Cortex catalog already holds this data: services, teams, domains, and the relationships between them, maintained by the engineers who run those systems.

Read Post

Cortex

Read more about Cortex catalog data now flows into Rootly

Designing the Operational Architecture for Continuous SLA Exposure Governance

Jun 25, 2026 By ScienceLogic In ScienceLogic

Organizations seeking to reduce SLA volatility often attempt incremental enhancements to existing monitoring stacks. While additional analytics layers may improve telemetry visibility, exposure governance cannot function effectively when data, service context, and execution capabilities remain fragmented. Treating exposure management as an add-on capability limits its ability to protect across interdependent systems in real time.

Read Post

ScienceLogic

Read more about Designing the Operational Architecture for Continuous SLA Exposure Governance

Where did all my Claude Code tokens go?

Jun 25, 2026 By Annie Freeman In Coralogix

Most teams judge their AI coding agent on two things: the monthly bill and a feeling. The bill tells you what you spent and the feeling tells you whether it seems to be helping, but neither one tells you what the agent actually did. As these tools move into the critical path of how software ships, that gap is starting to matter. I wanted to replace the feeling with something I could measure and understand what shapes of work affects this bill, so I decided to run an experiment on myself.

Read Post

Coralogix

Read more about Where did all my Claude Code tokens go?

What is mutation testing?

Jun 25, 2026 By Roger Winter In CircleCI

A test suite can be all green and hit 100% line coverage and still miss bugs. Coverage measures which lines ran during the tests, not whether the assertions actually caught a defect. A test that calls a function but never checks the return value still counts toward the coverage number. The bug it would have prevented still ships.

Read Post

CircleCI

Read more about What is mutation testing?

How we saved over $3 million in idle compute costs with Datadog Kubernetes Autoscaling

Jun 25, 2026 By Jacob Simonov In Datadog

At Datadog, our broad Kubernetes footprint amplifies the significance of a familiar autoscaling tradeoff: Overprovisioning wastes cloud spend, while underprovisioning threatens reliability. We built Datadog Kubernetes Autoscaling (DKA) to help teams rightsize their workloads by generating intelligent resource recommendations and automating multidimensional workload scaling. Across Datadog, adopting DKA has eliminated more than $3 million in annualized idle compute costs while reducing reliability risks.

Read Post

Datadog

Read more about How we saved over $3 million in idle compute costs with Datadog Kubernetes Autoscaling

How Kubernetes Operators May Conflict With Resource Optimization (And How to Avoid It)

Jun 25, 2026 By Kubex In Densify

A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application. It extends the native Kubernetes API by combining custom resources (CRDs) with a dedicated controller: a custom control loop that continuously watches the state of those resources. The primary purpose of an operator is to automate complex, stateful applications (like databases, message queues, or monitoring suites) that require human operational knowledge to maintain.

Read Post

Densify

Read more about How Kubernetes Operators May Conflict With Resource Optimization (And How to Avoid It)

Operations | Monitoring | ITSM | DevOps | Cloud

High Cardinality in ClickHouse at Scale: What Actually Breaks

Chart Your Team's Analytics Journey with Customizable Dashboards in DX NetOps

What's New in Network Observability for Summer 2026

The debugging crisis nobody's talking about: AI, abstraction, and the skills gap

Cortex catalog data now flows into Rootly

Designing the Operational Architecture for Continuous SLA Exposure Governance

Where did all my Claude Code tokens go?

What is mutation testing?

How we saved over $3 million in idle compute costs with Datadog Kubernetes Autoscaling

How Kubernetes Operators May Conflict With Resource Optimization (And How to Avoid It)

Monthly Archive

Follow Us