Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Build on Your Microsoft SCOM Foundation

Enterprises that rely on Microsoft System Center Operations Manager (SCOM) as their monitoring backbone often share an everyday reality: the bigger the environment, the bigger the challenges. Noisy alert storms can bury critical issues. Management Packs (MPs) require ongoing care and expertise to deliver accurate insights. And without consistent reporting, teams risk slipping into reactive fire-fighting instead of strategic monitoring.

Telemetry Now Teaser: "Turning Network Telemetry Into Financial Insight"

Network operators prioritize cost, performance, security, and reliability as their core foundational needs. But how do they get the economic data to make tradeoffs when one of these needs suffers? Tune into the latest Telemetry Now with special guest Lauren Basile to learn how Kentik Traffic Costs is providing data-backed answers to these questions.

How We Built VictoriaLogs Cluster: A CTO's

Go behind the scenes with the VictoriaMetrics team! In this special talk, Marc Sherwood is joined by our CTO, Alexander Marshalov, to explore our powerful, open-source logging solution, VictoriaLogs. This isn't just a feature showcase. This is a deep dive into the engineering mindset that drives our development. Alexander shares firsthand insights into why we built VictoriaLogs Cluster, the technical challenges of creating a distributed system for logs, and the core principles of simplicity and efficiency that guide our architecture.

How GenAI Is Empowering Elastic Workforce

With over 10,000 questions answered and a 99% satisfaction rate in just 90 days, ElasticGPT, our internal generative AI assistant built on Elastic’s Search AI Platform, is transforming how our teams find information, make decisions, and complete day-to-day tasks. Matt Minetola, CIO, explains how ElasticGPT helps employees access company knowledge faster using natural language queries. Learn how we’re using retrieval augmented generation (RAG) and a secure, scalable architecture to deliver trusted, real-time AI experiences across the organization.

Model your architecture with custom entities in the Datadog Software Catalog

Every software organization has its own unique architecture and workflows. Beyond services and APIs, teams rely on internal libraries, CI/CD jobs, data pipelines, AI agents, and more to keep systems running smoothly. But as architectures grow more complex and interconnected, it can become difficult to keep track of all the structural dependencies and interactions in one place.

Accelerating SIEM Migration with AI-Native Data Pipelines

Security teams are increasingly realizing that yesterday’s SIEMs weren’t built for today’s world. Legacy platforms were designed for static, on-prem environments where data sources were relatively predictable and volumes were manageable. But the shift to cloud, SaaS, and dynamic workloads has completely changed the equation. Cloud-friendly, flexible, and cost-conscious SIEMs are now table stakes.

Why Does Your Node.js App Crash in Production and How Can You Fix it?

Node.js has become one of the most popular platforms for building scalable and high-performance web applications. Its event-driven, non-blocking I/O model allows developers to efficiently handle thousands of concurrent connections with minimal overhead. However, many businesses still face a critical challenge, Node.js applications often crash unexpectedly in production environments, causing downtime, lost revenue, and damage to brand reputation.

The telemetry time bomb - and what to do about it

Telemetry data is growing at an average of 29% a year — doubling costs every 18 months. That’s putting pressure on ITOps budgets, observability platforms, SecOps teams, and SIEM deployments alike. In this post, we’ll explore how unchecked data volumes, siloed tools, and aging architectures are creating a telemetry cost crunch that limits visibility, slows both troubleshooting and threat detection, and impacts business outcomes.