Operations | Monitoring | ITSM | DevOps | Cloud

Why dashboards still matter in the age of AI

I recently gave a talk at Experts Live India 2026 about SquaredUp, and even before getting into the demo, there was one question I knew I had to address: Is the dashboard era over? It's something we're all hearing more. "Just ask AI." "Agentic AI will build your dashboards automatically." "Why bother with static views when a chatbot can answer anything?" It's a fair question. Answering it requires a clear understanding of what a dashboard represents.

What Is Network Operations Center (NOC)

Quick Answer A Network Operations Center (NOC) — pronounced “knock” — is a centralized physical or virtual facility where IT professionals monitor, manage, and maintain an organization’s network infrastructure on a 24/7/365 basis. The NOC serves as the nerve center for detecting incidents, coordinating responses, and ensuring maximum network availability and performance.

Faster fixes, less context sharing: how Grafana Assistant learns your infrastructure before you even ask

When an unexpected alert fires these days, most engineers' first move is to ask their AI assistant for help.You ask why your checkout service is slow and the assistant gets to work, but it can't get any meaningful insights—at least not quickly—without the proper guidance. So, the next thing you know you're sharing deals about your existing data sources, the services you have running, how they connect, which labels and metrics matter, and on and on.

Rollbar Pricing Explained: Plans, Features, and What You Actually Pay

You’re comparing error monitoring tools. You’ve narrowed it down to two or three options. Now you need to know what this actually costs before you bring it to your team. Here’s what Rollbar costs, what’s included at each tier, and how it compares to Sentry and Datadog on pricing. No sales pitch, just the math.

What's New in InfluxDB 3 Explorer 1.8: Streaming Subscriptions, Smarter Sample Data, Line Protocol Validation, and Retention Controls

InfluxDB 3 Explorer 1.8 is all about writing data and keeping it under control. You can now subscribe to MQTT, Kafka, and AMQP streams directly from Explorer, generate custom sample datasets, stream live sample data continuously into your database, and validate your line protocol and preview the resulting schema before you write it. You can now also view and edit retention periods on both databases and individual tables.

How to use an SRE agent to reduce downtime

An alert in the middle of the night warns of a potential business failure. Manual incident response becomes more complex due to the overwhelming data from distributed and dynamic digital services. With an SRE agent, your engineering team can cut through alert clutter. They can sort through various signals quicker, decreasing burnout and achieving faster, more affordable resolutions. Operational resilience will see its next evolution with Agentic AI.

7 best AI deployment platforms for production Kubernetes workloads in 2026

Training a model in a notebook is easy. What breaks teams is the step after, serving it reliably without haemorrhaging cloud budget or burying your SREs in YAML. The common trap: picking a platform that handles the model but not the surrounding stack. An AI deployment platform should orchestrate the full application graph (inference endpoints, vector databases, caching layers, and frontends) inside a single VPC, with GPU autoscaling that doesn't require a dedicated platform engineer to babysit.

ActiveMQ MQTT Protocol Setup Guide: QoS, SSL, and IoT Scale

Modern enterprise architectures increasingly need to bridge the gap between resource-constrained IoT devices and heavyweight enterprise backend systems. ActiveMQ MQTT support makes this possible: devices running the MQTT protocol - sensors, actuators, edge nodes, publish telemetry on standard topics, while JMS-based backend services consume and process the data without any client-code changes.

How to Test SQS Workflows Locally with LocalStack and OpenTelemetry

LocalStack lets you run SQS, Lambda, and S3 locally in Docker — but there's a hidden trap: OpenTelemetry's default AWS propagator doesn't work with free LocalStack. Here's how to set up end-to-end local testing with working trace propagation. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Four types of incident alerts every team should know

Not every incident alert needs the same kind of response. One incident may need to wake someone up right away. Another may simply need to be picked up when the team starts work in the morning. Without a clear way to tell them apart, every incident feels equally urgent. That usually adds noise and makes incident response decisions harder than they need to be. This is where two questions help: In this guide, we’ll discuss what those questions mean and the four combinations that follow.