Operations | Monitoring | ITSM | DevOps | Cloud

Graviton5 in Production at Honeycomb: Per-service Results From the m8g to m9g Migration

This is the fourth installment in the Graviton retrospective series we've been writing since 2021. The methodology is the same one I always reach for: hold the workload constant, run both generations on the same Kubernetes namespace concurrently, and let the per-pod numbers speak.

What Is Enterprise Service Management (ESM)? Explained

Enterprise service management (ESM) applies the proven model of IT service management, catalogs, workflows, self-service, and SLAs, to the whole business: HR, facilities, finance, and more. Here is what it is and how it works. What is enterprise service management, and how is it different from ITSM? In this explainer we define ESM, show how it works across departments, clarify how it builds on IT service management, and cover the mistake most teams make: copying IT ticket forms instead of orchestrating work across teams.

Kubeflow MLOps tutorial: from notebook development to production inference

In this video, our engineering team takes you through a full end-to-end Kubeflow implementation, step by step – from data exploration to production inference. Follow the journey of a house price prediction use case and see how modern MLOps components work together: Kubeflow architectures and starter repositories Notebook-based development workflows Data exploration and model development MLflow for experiment tracking Katib for hyperparameter optimization Kubeflow Pipelines for automated preprocessing and training KServe for scalable model inference.

Grafana Tempo: The distributed tracing journey to 3.0 (June 2026 Community Call)

Our distributed tracing journey from the inception of Tempo to 3.0. Can't comment in the chat? You may need to create a channel. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, traces, and profiles.

Tap-to-call | OnPage New Feature Release

Introducing Tap-to-Phone Call in OnPage. When critical incidents require more than messaging, teams need a fast way to connect. With Tap-to-Phone Call, users can place a direct phone call to group members directly from within an OnPage conversation. By simply tapping the phone icon, responders can transition from secure messaging to live voice coordination through their mobile carrier network, helping teams communicate faster when every second counts.

Shipped: Catch the runaway agent while it's still running.

AI spend has no ceiling. An engineer can burn $5,000 in an hour, and a team that spins up an agent on Friday can loop it on a bad prompt all weekend. You find out when the bill lands: the money is already gone, the damage pieced back together from logs. Cloud spend had a natural limit. Tokens don’t. Now you see it as it happens. Connect a source and the calls stream in within seconds. Within minutes they’re broken out by model, provider, agent, and user.