Operations | Monitoring | ITSM | DevOps | Cloud

How to monitor NVIDIA GPU metrics with Elastic Observability

Graphical processing units, or GPUs, aren’t just for PC gaming. Today, GPUs are used to train neural networks, simulate computational fluid dynamics, mine Bitcoin, and process workloads in data centers. And they are at the heart of most high-performance computing systems, making the monitoring of GPU performance in today's data centers just as important as monitoring CPU performance.

11 Network Traffic Terms to Know

Every industry loves its terms and jargon. Stop me if you’ve heard this one before: “I’ve always said that one of my core competencies is getting the most bang for my buck out of the sweat equity I put in during my 9-to-5.” Sure, the sentence doesn’t really make any sense, but it sounds good enough when you say it. And that’s just the point jargon tends to make. The IT industry is no different.

InfluxDB C Client Library for Capturing Statistics

Currently, there is no official InfluxDB C language client library. Fortunately, I wanted to do exactly that for capturing Operating System performance statistics for AIX and Linux. This data capturing tool is called “njmon” and is open source on Sourceforge. So having worked out how and developing a small library of 12 functions for my use to make saving data simple, I thought I would share it. I hope it will prove useful for others.

Windows network monitoring made easy with OpManager

Network administrators are responsible for the day-to-day operation of computer networks at organizations of any size and scale. Their primary duty is to manage, monitor, and keep a close watch on the network infrastructure to prevent and minimize downtime. Managing a network includes monitoring all the network components, including Windows devices. In any Windows network, the desktops, servers, virtual servers, and virtual machines (VMs), like Hyper-V, run on the Windows operating system.

All You Need To Know About Cloud Interconnection

Many enterprises today have a range of assets residing in a mixture of both public and private clouds. As a result, there is a need to connect not just site-to-cloud but also cloud-to-cloud - use cases we would term Data Centre Interconnection (DCI) and Cloud Interconnection.

A Quick Guide to Log Shipping To Logz.io: Collectors, Code, and Clouds

One of the great things about Logz.io Log Management is that it’s based on the most popular open source logging technology out there: the ELK Stack (click here to view our thoughts and plans on the recent Elastic license). This means Logz.io users get to leverage log shipping and collector options within the rich ELK ecosystem. So how do you know which log shipping technology to use?

Troubleshooting Large Queues in RabbitMQ

If you’re a RabbitMQ user, chances are that you’ve seen queues growing beyond their normal size. This causes messages to get consumed long after they have been published. If you’re familiar with Kafka monitoring, you’ll call it consumer lag, but in RabbitMQ-land it’s often called queue length or queue depth.