Operations | Monitoring | ITSM | DevOps | Cloud

What is Mean Time to Detect (MTTD) - and why does it matter for ITOps?

Have you ever wondered about your IT team’s efficiency in detecting incidents? Your Mean Time to Detect (MTTD) is an incident management Key Performance Indicator (KPI) that reveals your productivity during the first stage of incident resolution and enables investigation into opportunities for improvement. ITOps and DevOps teams that can lower their MTTD can more quickly identify issues, minimize potential downtime, and maintain system reliability too.

Open source log monitoring: The concise guide to Grafana Loki

Five years ago today, Grafana Loki was introduced to the world on the KubeconNA 2018 stage when David Kaltschmidt, now a Senior Director of Engineering at Grafana Labs, clicked the button to make the Loki repo public live in front of the sold-out crowd. At the time, Loki was a prototype: We bolted together Grafana as a UI, Cortex internals, and Prometheus labels to find out if there was a need for a new open source tool to manage logs.

The Advent of Monitoring, Day 4: Solving E2E Testing Challenges With Checkly's PWT Garbage Collector

This is the fourth part of our 12-day Advent of Monitoring series. In this series, Checkly's engineers will share practical monitoring tips from their own experience. One challenge in conducting end-to-end (E2E) testing is managing the artifacts created during the process. These artifacts are necessary for asserting specific functionalities.

Failure Flags helps build testable, reliable software-without touching infrastructure

Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to reliably root out issues before they impact customers. However, most current Chaos Engineering and resilience testing is focused on the underlying infrastructure. This helps identify potentially catastrophic failures, but misses the more frequent failures that still significantly impact customer experience.

The Future of IT Asset Management: 9 ITAM Trends For 2024

Your IT Asset Management (ITAM) practice is not immune from the current corporate focus on IT optimization, that includes both asset utilization and ITAM operations. This, along with increasing technology complexities on the radar, are influencing many of the ITAM trends for 2024. So, where is everything heading towards?

The Advent of Monitoring, Day 3: Easy Monitoring for Self-Hosted Projects with Checkly

This is the third part of our 12-day Advent of Monitoring series. In this series, Checkly's engineers will share practical monitoring tips from their own experience. When it comes to running self-hosted services or side projects, monitoring is key. But, who has the time to set up a complex monitoring system? We want to deliver cool software and not be busy with configuring Prometheus servers or Grafana Dashboards.

Using OpenTelemetry Collector Loki Receiver to Send Logs to SigNoz [Code Tutorial]

In this tutorial, you will learn how to collect logs using the Loki receiver in OpenTelemetry Collector to send logs to SigNoz. If you’re using Promtail to collect logs, you can send them to SigNoz instead of Loki via the OpenTelemetry Collector. In this tutorial, we cover: If you want to jump straight into implementation, start with this prerequisites section.

The Advent of Monitoring, Day 2: Debugging Dashboard Outages with Checkly's API Checks

Table of contents This is the second part of our 12-day Advent of Monitoring series. In this series, Checkly's engineers will share practical monitoring tips from their own experience. We encountered a tricky issue with our public dashboards: they were experiencing sporadic outages, happening about once every two days. The infrequency and unpredictability of these outages made them particularly challenging to diagnose.