Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Production testing: smoke tests with Cypress, CircleCI, and AWS

“Testing your production environment” refers to the practice of running tests on production servers, using actual data from real users. Production testing doesn’t replace other methods like unit or integration testing. Instead, it extends them. Smoke testing is one approach that Lumigo has implemented to test our own production environments.

Product Spotlight: Logz.io Telemetry Collector for Fast Data Shipping

Today we’re excited to announce Logz.io Telemetry Collector – an agent that can send logs, metrics, and traces to Logz.io in a single installation as part of our Open 360™ platform. With Telemetry Collector, customers can get started monitoring their services with Logz.io faster than ever by simplifying the data collection process.

Grafana Loki 2.7 release: TSDB index, Promtail enhancements, and more

Grafana Loki 2.7 has arrived! With it comes an experimental feature we are rather excited about: a redesigned index based off of the Prometheus TSDB index. While we are still in the early stages, this enhancement in Grafana Loki, which we previewed at ObservabilityCON 2022, creates a smaller storage footprint, better query performance, and much more that we will dive into below!

Cloud Monitoring: A Complete Guide

Cloud monitoring is the process of tracking, reviewing, and managing the health and security of cloud-based systems and applications. Cloud monitoring is essential for any organization that relies on cloud-based applications and services. It provides visibility into the performance of these systems and can help identify potential issues before they cause downtime or data loss.

Resource Guide for InfluxDB and AWS

InfluxDB Cloud runs natively on AWS. This is great for users that already rely on AWS because it keeps everything (or at least most things, hopefully!) in one place. This can also reduce data latency, if the region you use is geographically close to your data sources. Plus, it’s super easy to get started using InfluxDB on AWS. One of the great things about AWS is that it has a ton of different services and features that allow you to do more with your data.

How to Setup InfluxDB, Telegraf and Grafana on Docker: Part 2

This tutorial describes how to install the Telegraf plugin as a data-collection interface with InfluxDB 1.7 and Docker. In Part 1 of this tutorial series, we covered the steps to install InfluxDB 1.7 on Docker for Linux instances. We describe in Part 2 how to install the Telegraf plugin as a data-collection interface with InfluxDB 1.7 and Docker.

A day in the life of a Customer Support Detective

I open my laptop and look over my cases while I slurp down my first cup of coffee. Most of my backlog is waiting on customer updates, or bug fixes. Two of my cases have been marked for closure. Not a bad start for a Monday! A pod CrashLoopBackoff issue was resolved by bumping up memory requests, and the missing metrics issue was solved after applying some Prometheus annotations to the customer’s nginx pods. I notate and close both cases. No sooner do I hear the beep of the badge scanner.

A Simplified Guide to OpenTelemetry

Digital services are increasingly built as a collection of components working in concert to deliver significant business functions. Understanding how these components of a system are working is crucial to reliably delivering a service. With many systems interacting, it can be difficult, if not impossible, to understand the state of your services and their dependencies without detailed data about how they function.

FluentD vs Logstash - Choosing a Log collector for Log Analytics

When we have large-scale, distributed systems, Logging becomes essential for observability, monitoring, and security. No matter what architecture (Monolith/Microservices) our systems have, they are complex due to the number of moving parts they have and the challenges they face around management, deployment, and scaling. In this scenario, Log management tools rescue the DevOps and SRE teams in order to help them monitor and improve performance, debug errors, and visualize events.