Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Enhancing Postmortem Reports with AI

Postmortem reports are essential in incident management, helping teams learn from past mistakes and prevent future issues. Traditionally, creating these reports was a slow, tedious process, requiring teams to gather data from multiple sources and piece together what happened. But with AI and Large Language Models (LLMs), this process can become faster, smarter, and much less of a headache.

Creating In-Stream Alerts for Telemetry Data

Alerts that you receive from your observability tool are based on conditions that existed seconds to minutes in the past, because the alert is only triggered after the data has been indexed within the tool. This means that your ability to take timely action in response to the condition is significantly limited, and often your window of opportunity to react is past by the time you receive the alert.

Introducing Anomaly Detection: Smarter Alerts for Dynamic Metrics

Anomaly Detection will enable users to create smarter alerts based on dynamic metrics, moving beyond traditional fixed-threshold alerts. By detecting deviations from expected patterns, Anomaly Detection will help you stay informed about critical issues without getting overwhelmed by irrelevant alerts.

Introducing Anomaly Detection - Smarter Alerts for Dynamic Metrics

Today, we’re excited to unveil the Anomaly Detection feature. It will enable users to create smarter alerts based on dynamic metrics, moving beyond traditional fixed-threshold alerts. It will soon be available to all our users and is currently undergoing beta testing with select users. By detecting deviations from expected patterns, Anomaly Detection will help you stay informed about critical issues without getting overwhelmed by irrelevant alerts. Let’s dig in deeper.

Reduce Noise through Intelligent Alert Grouping

In an ideal world, every alert would signal a unique and critical issue. However, in reality, alerts often come in waves. Alert noise refers to the overwhelming volume of notifications that incident response teams receive, many of which may be redundant or irrelevant. This can lead to alert fatigue, where critical issues might be overlooked due to the sheer number of notifications. ‍

Icinga Notifications: Incidents, Escalations, and Event Rules

Following the Icinga Notifications beta announcement, we already had a more general post on how to get started and one going into the details of schedules. This week’s blog post is a follow up in this series and will describe incidents, escalations, and event rules in Icinga Notifications in more detail. In case you haven’t seen the first two referenced blog posts, you might want to have a look at them first, otherwise, you could miss out on the big picture.