Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Deployment Tracking with Mezmo Live Streaming Tail

You've deployed a new feature into production. You've done your unit testing, fixed lots of bugs, your code is awesome. Now it's time for hundreds/thousands/millions of users to break...err...use your feature. You're diligent about tracking usage in real-time, and getting customer feedback when something goes wrong. You track the performance and response time impacts on the server. All is good...except...that feature isn't quite working for a specific group of users. Now what?

Observability and IT Monitoring for Federal, State, and Local Government | LogicMonitor

If you work in public sector IT—whether at the federal, state, or local level—you know how complex things have gotten. Keeping everything running smoothly is a daily challenge between aging infrastructure, hybrid cloud environments, and growing cybersecurity demands. LogicMonitor's hybrid observability platform powered by AI helps government IT teams simplify monitoring, reduce alert noise, and avoid issues with AI-powered insights. You’ll see how observability helps agencies.

LogicMonitor Achieves FedRAMP "In Process" Status: AI-powered Hybrid Observability for Government Agencies

Throughout my career working with government agencies, I’ve seen firsthand how critical it is to have monitoring solutions that meet federal security requirements while delivering the visibility needed to manage complex IT environments. That’s why I’m particularly proud to announce that LogicMonitor has reached a significant milestone in its commitment to serving government agencies and public sector organizations.

Adaptive Metrics in Action: How The Trade Desk Optimized Observability Costs | Grafana Labs

Managing observability costs at scale is no easy task — especially when metrics volume grows fast. In this talk, Paul Givens, Head of Observability at The Trade Desk, shares how they implemented Adaptive Metrics to control costs without sacrificing visibility. How Adaptive Metrics works to reduce cardinality and cost Real-world implementation lessons from a high-scale AdTech environment Key takeaways for teams managing large Prometheus-like metric sets.

New Google Cloud Run Visualization in Grafana Cloud | Demo | How to Monitor Google Cloud Run

Perfect for troubleshooting, performance tuning, and cost optimization, this new feature helps you stay in control of your Cloud Run workloads. With this sophisticated dashboard, you can: Monitor CPU, memory, network traffic, and active requests at a glance Drill down into individual services and containers with a single click Identify resource usage spikes and optimize performance Use the Right-Sizing View to find the top resource-heavy services & containers.

How we got abused via OTP

Going through my emails, I saw several about Twilio's auto-recharge, and then something about a suspension. We were using Twilio to send SMS messages and phone call alerts. "That's odd, let me check!". I logged into Twilio from my phone and checked. Horror. Instant horror. The balance was insane. But negative. I told my friend I need to sit down and check something. Pulled out my laptop and logged in. Same information. Same insane balance. Right there and then I knew it... we've been abused.

Essential Steps for Troubleshooting Network Problems

Everyone has a story about that one road trip where traffic got backed up, making people late to the event. When you have network connectivity problems, your information highway gets clogged up, making it difficult for users to access resources efficiently. While network troubleshooting strategies may seem simple, a lot of nuance and complexity lies in the activities when you dig into your data.

Simplifying Multi-Node Setups with InfluxDB 3 Enterprise Modes

As your time series data grows, managing increasing workloads can quickly become a headache. High data ingestion rates, numerous (and complex) queries, intensive processing tasks, and routine maintenance like data compaction often compete for limited resources. This leads to unpredictable performance and slower response times, and common solutions often introduce operational complexity.