Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Getting Started with Splunk Dashboards

Splunk is a leading platform for searching, monitoring, and analyzing logs across IT tools and systems. Well-known for its ability to handle vast volumes of log and event data, Splunk empowers organizations to gain real-time visibility into their systems and operations. However, while Splunk offers rich telemetry and analytics, its dashboards can sometimes become complex - making it difficult to surface the most critical insights quickly. That’s where SquaredUp can elevate the experience.

How to Choose the Right API Monitoring Tool for Production Environments

APIs are no longer just technical connectors between systems; they are production infrastructure. Customer-facing applications, partner integrations, payment flows, and internal microservices all depend on APIs working correctly, consistently, and at scale. When an API fails, the impact is rarely limited to a single endpoint; it can disrupt user journeys, compromise revenue, and breach service-level agreements (SLAs).

6 Common Factors That Influence Fleet Safety Program Success

Building a safer fleet is not about one silver bullet. It is a set of practical choices that add up, day after day, until safer habits and smarter tools become the way you operate. This article breaks the work into six factors you can act on. Each one is designed to be simple to start, measurable to manage, and durable enough to last when operations get busy.

Now available: More monitor history

We’re excited to roll out an improvement many of you have been asking for: extended historical metrics for website and ping monitors. Until now, monitor metrics like availability, downtime, and response times were limited to the last 24 hours. While useful for short-term checks, this made it harder to spot trends, investigate intermittent issues, or understand long-term performance. That changes today.

How Alerting Works in SolarWinds Observability Self-Hosted

This training video from SolarWinds Academy provides a high-level overview of how the alerting process works within SolarWinds software. Technical trainer Cheryl Nomanson explains the step-by-step workflow, starting with the alerting engine continuously scanning the database for conditions that meet alert trigger thresholds. She covers how triggered elements are evaluated for suppressions (like time-of-day restrictions and scoping), and explains that only fully qualified conditions become actual alerts. The video details how alerts always display in the web console and may trigger additional actions like emails or scripts.

How to Create an SNMP Poller in SolarWinds Observability Self-Hosted

SolarWinds technical trainer Cheryl Nomanson presents a systematic approach to optimizing and building custom SNMP pollers. The tutorial walks through a step-by-step process starting with adding devices for SNMP monitoring using default pollers, then identifying missing metrics and checking if the required OIDs exist. If OIDs don't exist, she explains how to use alternative OIDs or data transformation tools.

Networking Technology Trends for 2026

From an IT pro’s perspective, the future of networking technology in 2026 is a mixed bag of potential and security risk. New wireless tech, agentic AI, and the increased distribution of networks are enabling new use cases and helping automate toil, but they also create new attack surfaces and risk profiles. In this article, we’ll take a look at the ten network security trends we’re most excited about in 2026 and provide key insights about what each one means for IT and MSP teams.

Integrating Prometheus Metrics into Icinga Using check_prometheus

This article explains how to integrate metrics from Prometheus into Icinga checks using the check_prometheus plugin. There can be multiple reasons why this could be desired: Maybe you have different teams with their own monitoring systems, and you need to bridge the gap, or you want to perform queries that are just better expressed in Prometheus than in plain Icinga check plugins. The latter can be the case if you want to aggregate data from multiple sources or you want to take historic data into account.

The Incident Checklist: Reducing Cognitive Load When It Matters Most

In the previous post, we looked at what happens after detection; when incidents stop being purely technical problems and become human ones, with cognitive load as the real constraint. This post assumes that context. The question here is simpler and more practical. What actually helps teams think clearly and act well once things are already going wrong? One answer, used quietly but consistently by high-performing teams, is the checklist.