Monthly Archive

New in OTel: Auto-Instrument Your Apps with the OTel Injector

Jul 29, 2025 By Anjali Udasi In Last9

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

Read Post

Last9

Read more about New in OTel: Auto-Instrument Your Apps with the OTel Injector

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Jul 29, 2025 By Faiz Shaikh In Last9

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

Read Post

Last9

Read more about Why Your Loki Metrics Are Disappearing (And How to Fix It)

How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

Jul 28, 2025 By Anjali Udasi In Last9

When you export OpenTelemetry metrics to Prometheus, resource fields like service.name or deployment.environment don’t show up as metric labels. Prometheus drops them. To use them in queries, you’d have to join with target_info: This makes filtering and grouping more difficult than necessary. Prometheus 3.0 changes that. It supports resource attribute promotion—automatically converting OpenTelemetry resource fields into Prometheus labels.

Read Post

Last9

Read more about How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

OTel Weaver: Consistent Observability with Semantic Conventions

Jul 28, 2025 By Anjali Udasi In Last9

Deploying a new service shouldn’t break dashboards. But it happens, usually because metric names or labels aren’t consistent across teams. You end up with traces that don’t link, metrics that don’t align, and queries that take hours to debug, not because the system is complex, but because the telemetry is fragmented. OTel Weaver addresses this by enforcing OpenTelemetry semantic conventions at the source.

Read Post

Last9

Read more about OTel Weaver: Consistent Observability with Semantic Conventions

How sum_over_time Works in Prometheus

Jul 25, 2025 By Faiz Shaikh In Last9

The sum_over_time() function in Prometheus gives you a way to aggregate counter resets, gauge fluctuations, and histogram samples across specific time windows. Instead of seeing point-in-time values, you get the cumulative total of all data points within your chosen range—useful for calculating totals from rate data, tracking accumulated errors, or understanding resource consumption patterns over custom intervals.

Read Post

Last9

Read more about How sum_over_time Works in Prometheus

Use Telegraf Without the Prometheus Complexity

Jul 24, 2025 By Anjali Udasi In Last9

Every system needs observability. You need to know what your CPU, memory, disk, and network are doing, and maybe keep an eye on database query latency or Redis connection counts. But setting that up isn’t always simple. You start with a couple of shell scripts. Then come exporters. Then Prometheus. Before long, you’re managing scrape configs, tuning retention, and watching dashboards fail under load after two days of data.

Read Post

Last9

Read more about Use Telegraf Without the Prometheus Complexity

Ship Confluent Cloud Observability in Minutes

Jul 22, 2025 By Anjali Udasi In Last9

You're running Kafka on Confluent Cloud. You care about lag, throughput, retries, and replication. But where do you see those metrics? Confluent gives you metrics, sure, but not all in one place. Some live behind a metrics API, others behind Connect clusters or Schema Registries. You either wire them manually or give up. What if you could stream those metrics to a platform built for high-frequency, high-cardinality time series, and do it in minutes?

Read Post

Last9

Read more about Ship Confluent Cloud Observability in Minutes

Monitor Nginx with OpenTelemetry Tracing

Jul 21, 2025 By Prathamesh Sonpatki In Last9

At 3:47 AM, your NGINX logs show a 500 error. Around the same time, your APM flags a spike in API latency. But what's the root cause, and why is it so hard to correlate logs, traces, and metrics? When API response times cross 3 seconds, identifying whether the slowdown is at the NGINX layer, the application, or the database shouldn't require guesswork. That's where OpenTelemetry instrumentation for NGINX becomes essential.

Read Post

Last9

Read more about Monitor Nginx with OpenTelemetry Tracing

How to Set Up Real User Monitoring

Jul 21, 2025 By Anjali Udasi In Last9

Synthetic monitoring provides consistent, repeatable results, 2.1s load times, passing Lighthouse scores, and minimal variability. But those numbers reflect lab conditions. On slower networks, like 3G in Southeast Asia, real users may see much higher load times, 5.8s or more. This isn’t a fault of the tools. It’s a difference in testing context. Synthetic tests run on fast machines, stable connections, and clean environments.

Read Post

Last9

Read more about How to Set Up Real User Monitoring

Set Up ClickHouse with Docker Compose

Jul 18, 2025 By Preeti Dewani In Last9

ClickHouse is built for high-performance OLAP workloads, capable of scanning billions of rows in seconds. If your analytical queries are bottlenecked on PostgreSQL or MySQL, or you're burning too much on Elasticsearch infrastructure, ClickHouse offers a faster and more cost-efficient alternative. This blog walks through setting up ClickHouse locally with Docker Compose and scaling toward a production-grade cluster with monitoring in place.

Read Post

Last9

Read more about Set Up ClickHouse with Docker Compose

Stream AWS Metrics to Grafana with Last9 in 10 minutes

Jul 18, 2025 By Faiz Shaikh In Last9

It’s 2:47 AM and your Lambda functions are timing out. API response times are spiking. You’re flipping between the CloudWatch console, your APM tool, and your logs, trying to figure out what’s going wrong. CloudWatch has the metrics you need: CPU usage, memory pressure, and request rates — but connecting that data to what your app is doing takes time. The delay in stitching it all together slows down your incident response.

Read Post

Last9

Read more about Stream AWS Metrics to Grafana with Last9 in 10 minutes

Query and Analyze Logs Visually, Without Writing LogQL

Jul 17, 2025 By Anjali Udasi In Last9

It’s 2 AM. An incident’s in progress. Error rates are climbing. You jump into the logs, filter by service, adjust the time window… and now you need a LogQL query. You write one. It errors out. You fix the syntax, try again, only to realize you need a different filter or a new aggregation. Back to rewriting. By the time you’ve got the query right, you’ve already lost 10–15 minutes. The system is still broken, and you still don’t know why.

Read Post

Last9

Read more about Query and Analyze Logs Visually, Without Writing LogQL

Trace Go Apps Using Runtime Tracing and OpenTelemetry

Jul 17, 2025 By Preeti Dewani In Last9

When your Go service hits 500ms latencies but CPU usage is flat, tracing gives you visibility into what the profiler misses. With 1–2% runtime overhead, Go’s built-in tracing tools help you: This makes it easier to debug performance regressions that don’t leave a clear footprint.

Read Post

Last9

Read more about Trace Go Apps Using Runtime Tracing and OpenTelemetry

Kibana Logs: Advanced Query Patterns and Visualization Techniques

Jul 16, 2025 By Anjali Udasi In Last9

Kibana gives you a structured way to explore log data indexed in Elasticsearch. With the right queries and visualizations, you can identify anomalies, debug issues more quickly, and track trends across services. This blog covers practical ways to query logs using Kibana’s Lucene and KQL syntax, build visualizations that surface meaningful signals, and set up dashboards for ongoing log-based monitoring.

Read Post

Last9

Read more about Kibana Logs: Advanced Query Patterns and Visualization Techniques

Enable Kong Gateway Tracing in 5 Minutes

Jul 16, 2025 By Anjali Udasi In Last9

Kong Gateway is a popular API gateway that sits at the edge of your infrastructure, routing and shaping traffic across microservices. It’s fast, pluggable, and battle-tested, but for many teams, it remains a black box. You might have OpenTelemetry set up across your application stack. Traces flow from your app servers, databases, and third-party APIs. But the moment a request enters through Kong, observability drops off.

Read Post

Last9

Read more about Enable Kong Gateway Tracing in 5 Minutes

Build Log Automation with Last9's Query API

Jul 16, 2025 By Prathamesh Sonpatki In Last9

Manual log investigation is one of those engineering tasks that quietly drains hours without offering much real value. You're debugging an incident. Monitoring shows elevated error rates. Now begins the familiar drill: It’s a tedious cycle, and it doesn’t scale. The whole process breaks down when you’re trying to automate incident response, run continuous security monitoring, or generate compliance reports.

Read Post

Last9

Read more about Build Log Automation with Last9's Query API

Jaeger Metrics: Internal Operations and Service Performance Monitoring

Jul 15, 2025 By Faiz Shaikh In Last9

You're monitoring a microservices-based system. Alerts trigger when response times exceed 2 seconds. But when you open Jaeger, you're faced with thousands of traces. Identifying which service or operation is responsible becomes time-consuming. Jaeger metrics help reduce this friction by exposing aggregated telemetry. Instead of scanning individual traces, you get service-level and operation-level performance metrics, latency, throughput, and error rates that highlight where the issue lies.

Read Post

Last9

Read more about Jaeger Metrics: Internal Operations and Service Performance Monitoring

How to Get Grafana Iframe Embedding Right

Jul 14, 2025 By Anjali Udasi In Last9

Adding Grafana dashboards directly into your app lets users see monitoring data without switching tabs or tools. Using an iframe to embed Grafana does work, but it brings along some tricky authentication and security issues that aren’t always obvious at first. In this blog, we’ll go over the practical ways to embed Grafana dashboards from easy public snapshots to secure, private dashboards that need authentication.

Read Post

Last9

Read more about How to Get Grafana Iframe Embedding Right

Optimize LangChain Performance with Trace Analytics

Jul 14, 2025 By Anjali Udasi In Last9

You’ve instrumented your LangChain app, and traces are now flowing into Last9. Now the issues are visible: API costs are crossing $200/day, average response times exceed 3 seconds, and performance degrades under 100 concurrent users. A single tool call adds over 2 seconds. Bloated context windows are pushing up token usage, wasting $50/day. Here’s how to use trace data to identify and fix these inefficiencies, systematically and at scale.

Read Post

Last9

Read more about Optimize LangChain Performance with Trace Analytics

Elasticsearch with Python: A Detailed Guide to Search and Analytics

Jul 11, 2025 By Anjali Udasi In Last9

If you’re using Python for search, log aggregation, or analytics, you’ve probably worked with Elasticsearch. It’s fast, scalable, and fairly complex once you go beyond the basics. The official Python client gives you raw access to Elasticsearch’s REST API. But getting it to work the way you want, especially under load, can be tricky. This blog walks through practical ways to index, query, and monitor Elasticsearch from Python code, without getting lost in the docs.

Read Post

Last9

Read more about Elasticsearch with Python: A Detailed Guide to Search and Analytics

Cloud Log Management: A Developer's Guide to Scalable Observability

Jul 10, 2025 By Anjali Udasi In Last9

As systems move to microservices, serverless, and multi-cloud setups, debugging gets harder. You’re no longer dealing with a single log file; you’re looking at logs from dozens of services, running across different environments. Traditional debugging methods like SSH-ing into servers or adding print statements don’t scale in these environments. Cloud log management tools help by collecting logs from all your services into one place.

Read Post

Last9

Read more about Cloud Log Management: A Developer's Guide to Scalable Observability

What is Log Loss and Cross-Entropy

Jul 10, 2025 By Faiz Shaikh In Last9

You're building a classification model, and your framework throws around terms like "log loss" and "cross-entropy loss." Are they the same thing? When should you use binary cross-entropy versus categorical cross-entropy? What about focal loss? This blog breaks down these loss functions with practical examples and real-world implementations.

Read Post

Last9

Read more about What is Log Loss and Cross-Entropy

How to Get Logs from Docker Containers

Jul 9, 2025 By Preeti Dewani In Last9

When a container misbehaves, logs are the first place to look. Whether you're debugging a crash, tracking API errors, or verifying app behavior—docker logs gives you direct access to what's happening inside. This blog covers the full workflow: how to retrieve logs, filter them by time or service, and set up logging for production environments.

Read Post

Last9

Read more about How to Get Logs from Docker Containers

Troubleshooting LangChain/LangGraph Traces: Common Issues and Fixes

Jul 9, 2025 By Anjali Udasi In Last9

We’ve covered how to get LangChain traces up and running. But even when everything’s instrumented, traces can still go missing, show up half-broken, or look nothing like what you expected. This guide is about what happens after setup, when traces exist, but something’s off.

Read Post

Last9

Read more about Troubleshooting LangChain/LangGraph Traces: Common Issues and Fixes

Improve Consistency Across Signals with OTel Semantic Conventions

Jul 8, 2025 By Anjali Udasi In Last9

It’s 2 AM. Your API is timing out. Logs show a slow query. Metrics flag a spike in DB connections. Traces reveal a 5-second delay on a database call. But then the questions start:- Which database?- Does the query match the delay?- Why doesn’t this align with the connection pool metrics? Each tool uses different labels, db.name, database, sometimes nothing at all. Without a shared schema, connecting the dots is slow and frustrating.

Read Post

Last9

Read more about Improve Consistency Across Signals with OTel Semantic Conventions

How Replicas Work in Kubernetes

Jul 8, 2025 By Faiz Shaikh In Last9

Replicas in Kubernetes control how many copies of your pods run simultaneously. They're the foundation of scaling, availability, and recovery in your cluster. When you're running a stateless API or a background worker, understanding how replicas work directly impacts your application's reliability and performance. This blog walks through replica management, from basic concepts to production monitoring patterns that help you maintain healthy, scalable applications.

Read Post

Last9

Read more about How Replicas Work in Kubernetes

Instrument LangChain and LangGraph Apps with OpenTelemetry

Jul 7, 2025 By Anjali Udasi In Last9

In our previous blog, we talked about how LangChain and LangGraph help structure your agent’s behavior. But structure isn’t the same as visibility. This one’s about fixing that. Not with more logs. Not with generic dashboards. You need to see what your agent did, step by step, tool by tool, so you can understand how a simple query turned into a long, expensive run.

Read Post

Last9

Read more about Instrument LangChain and LangGraph Apps with OpenTelemetry

Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Jul 7, 2025 By Faiz Shaikh In Last9

Your Prometheus dashboard shows 847 CPU metrics. The alert fired—but is the problem in us-east or us-west? You're trying to rule out whether that new feature caused a latency spike, but the sheer number of time series isn’t helping. Grouping can make this manageable. By organizing metrics by shared label values, you can quickly spot which service or region is behaving differently, without digging through every metric.

Read Post

Last9

Read more about Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Docker Status Unhealthy: What It Means and How to Fix It

Jul 4, 2025 By Faiz Shaikh In Last9

If your container shows Status: unhealthy, Docker's health check is failing. The container is still running, but something inside, usually your app, isn’t responding as expected. This doesn’t always mean a crash. It just means Docker can’t verify the app is working. Here’s how to debug the issue and restore the container to a healthy state.

Read Post

Last9

Read more about Docker Status Unhealthy: What It Means and How to Fix It

LangChain Observability: From Zero to Production in 10 Minutes

Jul 3, 2025 By Anjali Udasi In Last9

LangChain apps are powerful, but they’re not easy to monitor. A single request might pass through an LLM, a vector store, external APIs, and a custom chain of tools. And when something slows down or silently fails, debugging is often guesswork. In one instance, a developer ended up with an unexpected $30,000 OpenAI bill, with no visibility into what triggered it. This blog shows how to avoid that using OpenTelemetry and LangSmith. With this setup, you’ll be able to.

Read Post

Last9

Read more about LangChain Observability: From Zero to Production in 10 Minutes

LangChain & LangGraph: The Frameworks Powering Production AI Agents

Jul 2, 2025 By Anjali Udasi In Last9

Your AI agent worked flawlessly in development, with fast responses, clean tool use, and nothing out of place. Then it hit production. A simple "What's our pricing?" query triggered six API calls, took 8 seconds, and returned the wrong answer. No errors. No stack traces. Unlike traditional systems, AI agents don't crash, they drift. They make poor decisions quietly, and your monitoring says everything's fine.

Read Post

Last9

Read more about LangChain & LangGraph: The Frameworks Powering Production AI Agents

How to Run Elasticsearch on Kubernetes

Jul 2, 2025 By Anjali Udasi In Last9

Elasticsearch stands as one of the most robust open-source search engines available today. Built on Apache Lucene, it handles complex search operations, real-time analytics, and large-scale data processing with impressive speed and accuracy. Kubernetes has transformed how we deploy and manage containerized applications. This orchestration platform automates deployment, scaling, and operations of application containers across clusters of hosts.

Read Post

Last9

Read more about How to Run Elasticsearch on Kubernetes

Logging in Docker Swarm: Visibility Across Distributed Services

Jul 1, 2025 By Faiz Shaikh In Last9

Docker Swarm's logging model shifts from individual container logs to service-level aggregation. The docker service logs command batch-retrieves logs present at the time of execution, pulling data from all containers that belong to a service across your cluster. This approach gives you a unified view of distributed applications, but it comes with its patterns and considerations for effective observability.

Read Post

Last9

Read more about Logging in Docker Swarm: Visibility Across Distributed Services

How to Write Logs to a File in Go

Jul 1, 2025 By Anjali Udasi In Last9

When your Go application moves beyond development, you need structured logging that persists. Writing logs to files gives you the control and reliability that stdout can't match, especially when you're debugging production issues or need to meet compliance requirements. This blog walks through the practical approaches, from Go's standard library to structured logging with popular packages.

Read Post

Last9

Read more about How to Write Logs to a File in Go

Operations | Monitoring | ITSM | DevOps | Cloud

New in OTel: Auto-Instrument Your Apps with the OTel Injector

Why Your Loki Metrics Are Disappearing (And How to Fix It)

How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

OTel Weaver: Consistent Observability with Semantic Conventions

How sum_over_time Works in Prometheus

Use Telegraf Without the Prometheus Complexity

Ship Confluent Cloud Observability in Minutes

Monitor Nginx with OpenTelemetry Tracing

How to Set Up Real User Monitoring

Set Up ClickHouse with Docker Compose

Stream AWS Metrics to Grafana with Last9 in 10 minutes

Query and Analyze Logs Visually, Without Writing LogQL

Trace Go Apps Using Runtime Tracing and OpenTelemetry

Kibana Logs: Advanced Query Patterns and Visualization Techniques

Enable Kong Gateway Tracing in 5 Minutes

Build Log Automation with Last9's Query API

Jaeger Metrics: Internal Operations and Service Performance Monitoring

How to Get Grafana Iframe Embedding Right

Optimize LangChain Performance with Trace Analytics

Elasticsearch with Python: A Detailed Guide to Search and Analytics

Cloud Log Management: A Developer's Guide to Scalable Observability

What is Log Loss and Cross-Entropy

How to Get Logs from Docker Containers

Troubleshooting LangChain/LangGraph Traces: Common Issues and Fixes

Improve Consistency Across Signals with OTel Semantic Conventions

How Replicas Work in Kubernetes

Instrument LangChain and LangGraph Apps with OpenTelemetry

Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Docker Status Unhealthy: What It Means and How to Fix It

LangChain Observability: From Zero to Production in 10 Minutes

LangChain & LangGraph: The Frameworks Powering Production AI Agents

How to Run Elasticsearch on Kubernetes

Logging in Docker Swarm: Visibility Across Distributed Services

How to Write Logs to a File in Go

Monthly Archive

Follow Us