Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Efficiently Monitor the State of Redis Database Clusters

Monitoring Redis, the open source in-memory data platform, is complicated enough when you are hosting your Redis instance on just a single server. It gets even more complex when you build a Redis cluster that consists of multiple nodes and distribute your data across them. But as long as you know which metrics to prioritize and how to collect them, Redis monitoring is feasible enough. This article offers an overview of how to monitor the state of Redis database clusters.

TL;DR InfluxDB Tech Tips: Debugging and Monitoring Tasks with InfluxDB

With InfluxDB you can use Tasks to process data on a schedule. You can also use tasks to write custom alerts. However, sometimes your task will fail. In this TLDR, we’ll learn how to debug your task with the InfluxDB UI and the InfluxDB CLI.

VPN and Firewall Log Management

The hybrid workforce is here to stay. With that in mind, you should start putting more robust cybersecurity controls in place to mitigate risk. Virtual private networks (VPNs) help secure data, but they are also challenging to bring into your log monitoring and management strategy. VPN and firewall log management gives real-time visibility into security risks. Many VPN and firewall log monitoring problems are similar to log management in general.

Trace AWS event-driven serverless applications with Datadog APM

Last year, we released native tracing for AWS Lambda through Datadog APM to provide deep visibility into serverless functions and surface performance issues such as cold starts and errors, without any added latency. But Lambda functions are only one piece of the puzzle in a rapidly growing serverless ecosystem, which includes message queues, data streams, notification services, and more.

Quick Test Feature

A feature that’s not available in the Monitive service, but has proven to be a useful helper is the ability to quickly check a website from several locations around the world. Just head out to the homepage and type in a website, with or without https://. Press Test Availability and you instantly get an overview of how your website is performing from several locations around the world.

Observability & AIOps, the perfect combination for dynamic environments

IT teams live in dynamic environments and continuous integration/continuous delivery has been on high demand. In the dynamic environment, DevOps and underlying technologies such as containers and microservices, continue to grow more dynamic, and complex. Now, just like DevOps, observability has become a part of the software development life cycle.

More Changes Mean More Challenges for Troubleshooting

The widespread adoption of Agile methodologies in recent years has allowed organizations to significantly increase their ability to push out more high quality software. Previous development practices revolved heavily around centralized applications and infrequent updates that were shipped maybe once a quarter or even once a year.

Why Your Mean Time to Repair (MTTR) Is Higher Than It Should Be

Mean time to repair (MTTR) is an essential metric that represents the average time it takes to repair and restore a component or system to functionality. It is a primary measurement of the maintainability of an organization’s systems, equipment, applications and infrastructure, as well as its efficiency in fixing that equipment when an IT incident occurs. Key challenges with MTTR arise from just trying to figure out that there is actually a problem.