%term

The latest News and Information on Service Reliability Engineering and related technologies.

From Promise to Practice: What Real AI SRE Can Actually Do When Production Breaks

Jan 4, 2026 By Itiel Shwartz In Komodor

We’ve written before about the advantages of training an AI SRE on real telemetry data rather than generic Kubernetes documentation. We’ve explained why RAG augmentation based on actual high-scale workload patterns produces better results than LLMs trained on generic scenarios or forum threads. The theory makes sense, the architecture is sound, and the approach is defensible.

Read Post

Komodor

Read more about From Promise to Practice: What Real AI SRE Can Actually Do When Production Breaks

Podman vs Docker 2026: Security, Performance & Which to Choose

Jan 2, 2026 By Anjali Udasi In Last9

When it comes to containerization technologies, Podman and Docker are the two giants that often come up in conversation. Both have revolutionized how we build, deploy, and manage containers, but what sets them apart? In this blog, we'll dive deep into a side-by-side comparison of Podman and Docker. We'll cover everything from architecture to security, performance, and compatibility.

Read Post

Last9

Read more about Podman vs Docker 2026: Security, Performance & Which to Choose

Datadog Pricing 2026: Full Cost Breakdown + How to Save 40-90%

Jan 2, 2026 By Anjali Udasi In Last9

When it comes to monitoring and observability tools, Datadog is often one of the first names that comes to mind. But while Datadog’s features are widely discussed, its pricing often remains a topic of confusion. How much does Datadog cost, and what factors influence your bill? This guide breaks down Datadog pricing to help you better understand its structure, hidden nuances, and whether it’s the right fit for your needs.

Read Post

Last9

Read more about Datadog Pricing 2026: Full Cost Breakdown + How to Save 40-90%

Why High-Cardinality Metrics Break Everything

Dec 31, 2025 By Prathamesh Sonpatki In Last9

High-cardinality metrics are one of those ideas that sound obviously right - until you try to use them in production. In theory, they promise precision. Instead of averages and rollups, you get specificity: per-request, per-userid, per-container, per-feature insights. The kind of detail we all immediately want when something is on fire. And then things start breaking. Not immediately. Not loudly.But quietly.

Read Post

Last9

Read more about Why High-Cardinality Metrics Break Everything

7 Kubernetes Predictions for 2026 - AI Will Push SRE to its Limit

Dec 29, 2025 By Itiel Shwartz In Komodor

As AI workloads shift from training to massive-scale inference, SRE teams are about to feel even more pressure. GPU-heavy computing is breaking the assumptions today’s clusters were built on, while enterprises are beginning to trust autonomous operations and cost pressure is pushing consolidation across the cloud-infrastructure stack.

Read Post

Komodor

Read more about 7 Kubernetes Predictions for 2026 - AI Will Push SRE to its Limit

Blameless Postmortem: Foundation of Site Reliability

Dec 23, 2025 By Nuno Tomas In isDown

When systems fail, the instinct to find someone to blame runs deep. But what if assigning fault actually makes your systems less reliable? A blameless postmortem culture transforms how teams learn from incidents, creating stronger systems and more effective incident response processes.

Read Post

isDown

Read more about Blameless Postmortem: Foundation of Site Reliability

Platform Engineering: Error Budgets Explained Simply #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Platform engineering provides powerful tools that handle a lot under the hood. Learn how to calculate your remaining error budget with a simple formula using real numbers and objective statements.

View Video

Last9

Read more about Platform Engineering: Error Budgets Explained Simply #shorts

Implementing SLOs: Our Scale Mistakes and Successes #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

30 minutes of eating crow! Learn from our SLO mistakes at Weave. Discover pitfalls and shortcuts to doing it right the first time. Avoid our wrong, wrong, wrong, wrongs!

View Video

Last9

Read more about Implementing SLOs: Our Scale Mistakes and Successes #shorts

OpenTelemetry Metrics: Traces, Logs & Prometheus Integration #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

OpenTelemetry aims to link metrics to traces and logs, offering OpenCensus users a seamless migration path. Work with existing protocols like Prometheus. Leverage existing tooling without learning something completely new.

View Video

Last9

Read more about OpenTelemetry Metrics: Traces, Logs & Prometheus Integration #shorts

OpenTelemetry: Components, SDKs, and Middleware Explained #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

OpenTelemetry explained: standards, SDKs for various languages (Ruby, Python, Go), and middleware tools. Deploy these to pre-process data and send it to your destination.

View Video