%term

The latest News and Information on Service Reliability Engineering and related technologies.

What is APM Tracing?

Sep 3, 2025 By Faiz Shaikh In Last9

APM tracing records the complete execution path of a request as it travels through your system, including database queries, external API calls, cache lookups, message queue events, and inter-service requests. Each step is captured with precise start and end timestamps, duration, and context such as service name, operation name, and relevant attributes. This lets you pinpoint where latency or errors originate without piecing together metrics and logs manually.

Read Post

Last9

Read more about What is APM Tracing?

A Single Hub for Telemetry: OpenTelemetry Gateway

Sep 1, 2025 By Anjali Udasi In Last9

The OpenTelemetry Gateway (OTel Gateway) is a centralized service that collects, processes, and routes telemetry data—metrics, traces, and logs—across your infrastructure. In a typical setup, each service pushes telemetry directly to an observability backend. While this approach works well for small environments, it becomes increasingly difficult to manage as systems grow.

Read Post

Last9

Read more about A Single Hub for Telemetry: OpenTelemetry Gateway

How to Choose the Right Incident Management Tool for Your Team

Aug 29, 2025 By Vishal Padghan In Squadcast

IT disruptions are inevitable. What separates a resilient organization from the rest is its ability to respond quickly, efficiently, and collaboratively to incidents. The cornerstone of such responsiveness? The right incident management tool. But with a market flooded with tools, each promising to revolutionize your workflows, how do you pick the one that truly fits your team's needs? In this blog, we'll break down the key factors to consider when selecting an incident management tool, ensuring you make an informed decision that enhances your team's effectiveness and reliability.

Read Post

Squadcast

Read more about How to Choose the Right Incident Management Tool for Your Team

A Practical Guide to Python Application Performance Monitoring (APM)

Aug 29, 2025 By Anjali Udasi In Last9

When your Python app starts slowing down, maybe queries are taking longer, memory keeps creeping up, or API calls are lagging—basic server metrics won’t tell you why. You need to see what’s happening inside the application itself. That’s the role of Application Performance Monitoring (APM). It gives you a breakdown of database queries, external API calls, memory usage, error rates, and more, so you can connect the dots between code and performance.

Read Post

Last9

Read more about A Practical Guide to Python Application Performance Monitoring (APM)

What is Database Monitoring

Aug 28, 2025 By Anjali Udasi In Last9

Database monitoring transforms from a reactive troubleshooting exercise into a proactive optimization strategy when you have the right tools and approaches in place. This blog shares practical ways to choose monitoring solutions, set up observability for different database platforms, and design workflows that scale in modern distributed systems.

Read Post

Last9

Read more about What is Database Monitoring

Incident Response for DevOps, SREs, and IT Teams

Aug 25, 2025 By Sreekar In Spike

That 3 AM alert is never fun. Your heart races as you try to figure out what broke this time, and how fast you can fix it. But with an incident response in place, that panic turns into a calm, step-by-step fix. It helps you handle everything, from a server crash to a security breach, in an organized way. In this guide, I’ll walk you through what exactly an incident response is, why you need it, its key components, and how to build one.

Read Post

Spike

Read more about Incident Response for DevOps, SREs, and IT Teams

OpenTelemetry API vs SDK: Understanding the Architecture

Aug 25, 2025 By Anjali Udasi In Last9

When you're instrumenting applications with OpenTelemetry, you'll encounter two core components: the API and the SDK. The API defines what telemetry data looks like and how it is created, while the SDK handles how that data is processed and exported. Understanding this split helps you build more maintainable observability and avoid tight coupling between your business logic and telemetry infrastructure.

Read Post

Last9

Read more about OpenTelemetry API vs SDK: Understanding the Architecture

APM Logs: How to Get Started for Faster Debugging

Aug 21, 2025 By Anjali Udasi In Last9

When application performance monitoring detects a spike in latency or error rates, the immediate challenge is determining the underlying cause. APM logs address this by correlating performance metrics with the specific log events that occurred at the same time. Instead of switching between monitoring dashboards and manually searching through log files, APM log correlation consolidates both views.

Read Post

Last9

Read more about APM Logs: How to Get Started for Faster Debugging

Discover Infrastructure: Kubernetes & Hosts - Launch Week / Day 03

Aug 20, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Stop debugging infrastructure issues across multiple dashboards. See how Last9's Discover Infrastructure monitors K8s pods and traditional hosts together—with resource analysis, pod-level debugging, and AI that correlates app problems to infrastructure root causes. One setup (K8s + host monitoring) → Complete infrastructure visibility that connects to your services and jobs. No more blind spots between application performance and underlying resources.

View Video

Last9

Read more about Discover Infrastructure: Kubernetes & Hosts - Launch Week / Day 03

Frontline Reliability: Protecting User Journeys with SLOs with Shery Brauner (Razor, ex-Zalando)

Aug 20, 2025 By Rootly In Rootly

What does it really take to move from firefighting incidents to building reliability at scale? In this episode of Humans of Reliability, Shery Brauner (Razor, ex-Zalando) shares her unique journey from frontend and backend engineering to leading site reliability practices. She explains why protecting the user journey is the key to effective incident management, how SLOs cut through noisy alerts, and why observability must come first.

View Video