Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Top 12 SolarWinds Competitors and Alternatives In 2024

Organizations exploring SolarWinds alternatives often face a critical decision when choosing the right network and infrastructure monitoring solution. While SolarWinds has established itself as a reliable industry standard, companies are increasingly seeking alternatives that offer better alignment with their monitoring needs, budget constraints, and security requirements.

Did Delta's slow web performance signal trouble before CrowdStrike?

The CrowdStrike outage was a reminder of how quickly the dominoes can fall—especially when the foundation is shaky. Delta Airlines was hit harder than its competitors. While United and American Airlines were able to recover within days, Delta faced ongoing struggles, leading to the cancellation of 7,000 flights over five days.

Tracing the Line: Understanding Logs vs. Traces

In the software space, we spend a lot of time defining the terminology that describes our roles, implementations, and ways of working. These terms help us share fundamental concepts that improve our software and let us better manage our software solutions. To optimize your software solutions and help you implement system observability, this blog post will share the key differences between logs vs traces.

Common Kafka Cluster Management Pitfalls and How to Avoid Them

Managing a Kafka cluster is no small feat. While Kafka’s distributed messaging system is incredibly powerful, keeping it running smoothly takes careful planning and a keen eye on the details. Small mistakes in Kafka management can quickly add up, leading to bottlenecks, unexpected downtime, and overall reduced performance. Let’s explore some common Kafka management pitfalls and, more importantly, how to steer clear of them.

How to Prepare Your Data Estate for AI Success

It’s hard not to speak in cliches when we talk about artificial intelligence (AI). Today, AI seems to be all around us. And whatever its cultural impact, its rapid evolution is leading to widespread adoption across industries. Much of the discourse focuses on what machine intelligence can do to enrich our lives and businesses. But less has been said about data, and how every AI system relies on it to operate.

Buyer's Guide to Network Automation

In today’s complex IT landscape, network automation is no longer a luxury; it’s essential. With over 65% of enterprise networking activities still being manually, the shift toward automation presents a massive opportunity to improve efficiency, reliability, and scalability. This guide will help IT leaders understand the market for network automation platforms, the key considerations for choosing a platform, and how these tools are redefining network management from Day 0 to Day N activities.

Anatomy of an OTT Traffic Surge: The Fortnite Chapter 2 Remix Update

On Saturday, November 2, the wildly popular video game Fortnite released its latest game update: Fortnite Chapter 2 Remix. The result was a surge of traffic as gaming platforms around the world downloaded the latest update for the seven-year-old game. Doug Madory looks at how the resulting traffic surge can be analyzed using Kentik’s OTT Service Tracking.

Operationalizing AI for IT operations

Advances in artificial intelligence are rapidly transforming the IT operations landscape. According to Enterprise Strategy Group, 85% of organizations use or plan to deploy AI across many functional areas, including IT operations. Among its many benefits, AI can help ITOps teams: AI has immense potential to transform how IT operations, service management, and infrastructure teams function. Adoption is the first step toward creating organizational change.

Monitoring domains and DNSSEC properly

First of all, if you own a domain, the following text is for you. In production you obviously want to reduce outages. And an outage of a DNS domain as such takes down all services under that domain, no matter whether your LAMP components are all up and running. At least from users’ perspective. As usually, roughly speaking, monitoring has to “play end user” to properly discover failures end-to-end. At best you have an Icinga satellite (e.g.