Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

OpenTelemetry Visualization Setup: A Developer's Guide

If you've ever tried to set up OpenTelemetry visualization, you know it can be a bit overwhelming. But don't worry—in this guide, we'll break it all down step by step. Whether you're just getting started or looking to fine-tune your existing setup, this walkthrough will help you get the most out of your telemetry data.

How to Use OpenSearch with Python for Search and Analytics

If you're working with search and analytics, you’ve probably heard about OpenSearch—the open-source alternative to Elasticsearch. OpenSearch is a powerful tool, whether you're building a search engine, running log analytics, or implementing full-text search in your applications. And the best part? You can integrate it easily with Python.

An In-Depth Guide to Java Performance Monitoring for SREs

If you've ever had a Java application slow down in production and struggled to pinpoint the cause, you know the pain of performance issues. Java is a powerful, high-level language, but it doesn’t come without challenges—especially when it comes to resource management, garbage collection, and thread handling. This guide will take you through everything you need to know about Java performance monitoring, from key metrics to tools and best practices.

OpenTelemetry UI: The Ultimate Guide for Developers

If you’ve ever struggled with understanding distributed traces, managing metrics, or debugging complex applications, OpenTelemetry is your best friend. But what about the OpenTelemetry UI? How do you visualize and interact with all that telemetry data? In this guide, we’ll explore the best ways to use OpenTelemetry’s UI options, from setting up a proper observability stack to choosing the right front-end visualization tools.

Integrating OpenTelemetry with Grafana for Better Observability

Modern application observability is essential for ensuring system performance, diagnosing issues, and optimizing user experiences. OpenTelemetry (Otel) and Grafana serve as two key components in achieving end-to-end visibility. While OpenTelemetry focuses on instrumenting applications to collect telemetry data, Grafana specializes in visualizing this data, making it actionable and insightful.

Helm vs Terraform: A Detailed Comparison for Developers

When managing infrastructure and deploying applications in a cloud-native environment, two popular tools that developers often compare are Helm and Terraform. While both are used to automate deployments, they serve different purposes and operate in distinct ways. Understanding the differences can help you make the right choice for your use case.

A Quick Guide for OpenTelemetry Python Instrumentation

OpenTelemetry is an open-source tool that helps you keep an eye on your application’s performance. Whether you’re building microservices, using serverless setups, or working with a traditional monolithic app, it’s crucial to monitor and trace your app’s behavior for debugging and optimization. OpenTelemetry's Python instrumentation is an excellent way to track traces, metrics, and logs across your entire app.

Tomcat Logs: Locations, Types, Configuration, and Best Practices

Apache Tomcat logs are essential for monitoring, debugging, and maintaining Java applications running on Tomcat. These logs capture critical information such as server startup details, request handling, and application errors. They help developers and system administrators troubleshoot issues, analyze traffic, and ensure application stability. Tomcat generates multiple logs, each serving a distinct purpose.

AI in Production with GitHub's Sean Goedecke

In this episode, we sit down with Sean Goedecke, Staff Software Engineer at GitHub, to discuss where LLMs fit into real-world development. Sean shares how he’s using LLMs how he’s drawing the line for AI-assistance in the codebases he manages—though, as he says, this might all change by next summer. Sean also weighs in on how LLMs could assist SREs during outages—especially when you’re only half-awake at 3 a.m. after a rather inconvinient page.