%term

The latest News and Information on Service Reliability Engineering and related technologies.

We Built an SRE Agent With Memory And It's Transforming Incident Response

Oct 30, 2025 By Julia Nasser In PagerDuty

If you feel like your incidents are multiplying while your stack gets more complex by the week, you’re not alone. Event volumes keep climbing, signals live in a dozen tools, and human responders are stretched thin. That’s exactly why we built the PagerDuty SRE Agent—a vendor‑agnostic AI teammate that improves with every response to make the next one faster, smarter, and more reliable.

Read Post

PagerDuty

Read more about We Built an SRE Agent With Memory And It's Transforming Incident Response

Same code, same infra but your model is now broken #ai #devops

Oct 30, 2025 By Rootly In Rootly

View Video

Rootly

Read more about Same code, same infra but your model is now broken #ai #devops

Sidecar or Agent for OpenTelemetry: How to Decide

Oct 29, 2025 By Anjali Udasi In Last9

Getting telemetry out of a distributed system isn’t the hard part. Getting it out cleanly, without noise, drop-offs, or odd performance side-effects — that’s where things get interesting. Before you worry about processors or storage costs, you need a clear plan for where the OTel Collector should run. Most teams narrow this down to two options: a sidecar that sits next to each service, or a node-level agent that handles data for everything running on the node. Both patterns are solid.

Read Post

Last9

Read more about Sidecar or Agent for OpenTelemetry: How to Decide

OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces

Oct 28, 2025 By Anjali Udasi In Last9

You're sampling 1% of traces in production. A payment request fails at 3 AM. Logs show an error in order-service, but the full picture isn't there because different services made different sampling decisions. order-service kept the trace; payment-service didn't. So you end up checking logs and timestamps across a few services to piece things together. This happens because the usual probability sampling approach makes a separate choice at each service boundary.

Read Post

Last9

Read more about OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces

OpenTelemetry Spans Explained: Deconstructing Distributed Tracing

Oct 24, 2025 By Anjali Udasi In Last9

In a microservices architecture, a single user request can pass through multiple services before completing. When performance drops or an error occurs, tracing that journey is the only way to locate the source. Distributed tracing provides that visibility. At its core are OpenTelemetry Spans — units of work that capture what each service does during a request.

Read Post

Last9

Read more about OpenTelemetry Spans Explained: Deconstructing Distributed Tracing

Top 11 Ruby APM Tools for 2025: A Performance-Driven Selection

Oct 23, 2025 By Anjali Udasi In Last9

Observability has become a core part of running Ruby applications at scale. Knowing how your app performs — from request latency to background job execution — helps catch slowdowns early and improve reliability. This blog walks through some of the most useful APM tools for Ruby in 2025. Each section highlights what the tool does well, where it fits best, and what kind of visibility it brings to your application's performance.

Read Post

Last9

Read more about Top 11 Ruby APM Tools for 2025: A Performance-Driven Selection

Top 9 APM Tools for Node.js Performance Monitoring

Oct 16, 2025 By Anjali Udasi In Last9

When a Node.js app slows down, you don’t get a clear picture right away. One service stalls, another spikes in CPU, and somewhere in between, requests start piling up. You can’t fix what you can’t see. Application Performance Monitoring (APM) tools close that gap. They capture request traces, latency, and errors across your stack — showing you what’s running slow and why.

Read Post

Last9

Read more about Top 9 APM Tools for Node.js Performance Monitoring

Implement Distributed Tracing with Spring Boot 3

Oct 15, 2025 By Anjali Udasi In Last9

A slow checkout request. A background job stuck waiting on another service. A log message that looks fine — until performance drops. In a Node.js microservices setup, these are the moments that test your observability. You know something's wrong, but tracing the request across dozens of services feels impossible. Distributed tracing changes that. It connects every span in the request's journey, showing exactly where time is spent and where things start to break down.

Read Post

Last9

Read more about Implement Distributed Tracing with Spring Boot 3

Last9 Named a Gartner Cool Vendor in AI for SRE and Observability

Oct 15, 2025 By Nishant Modak In Last9

Gartner recognizes Last9 in their latest Cool Vendor report for unified telemetry and agentic SDK—moving teams from reactive monitoring to proactive ops. Founder at Last9. Loves building dev tools and listening to The Beatles.

Read Post

Last9

Read more about Last9 Named a Gartner Cool Vendor in AI for SRE and Observability

Choosing the Right APM for Go: 11 Tools Worth Your Time

Oct 14, 2025 By Faiz Shaikh In Last9

If you’re building high-performance systems, Golang has probably earned a spot in your stack. Its speed, lightweight concurrency, and quick compile times make it ideal for scalable APIs, microservices, and distributed systems. But those same qualities that make Go powerful can make performance monitoring tricky. Goroutines run fast and in parallel, which means a simple CPU or memory graph doesn’t always tell you what’s slowing things down.

Read Post