Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Revolutionizing your Grafana setup with intelligent alerting

Once upon a time, in the bustling city of DataVille, lived a team of dedicated IT professionals tirelessly working to maintain the city’s digital heartbeat. Their mission was to ensure the smooth operation of their city’s digital infrastructure, which was not limited to the daytime operations but extended beyond business hours. They were the unsung heroes, the guardians of the city’s data. Their tool of choice? Grafana, a powerful open-source platform for observability.

October 2023 Update - New layout, additional cross links, improved event filtering and much more

Our October update brings a new layout in the web portal, new additional cross-references from Signl details to linked entities, and improved grouping options for conditions in the distribution rules. As always, all the details are in this blog article.

Alerting, Incident Management and the SDLC | Better Incidents Podcast Ep. 8

In this episode we chat with veteran cloud architect Masaru Hoshi about the challenges of alert fatigue, the importance of effective alerting systems, and fostering ownership in software teams. Masaru shares insights from his 30-year career, emphasizing the need for balance, trust, and collaboration in incident response.

Global Event Rulesets: Streamlining Alert Routing Across Services

In the fast-paced world of organizations handling numerous microservices and projects, tackling the challenges that arise can be a daunting task. As many of our customers come with infrastructures that included a large number of microservices we set out to make it easier for them to streamline alert source management. Enter Global Event Rulesets (GER). This feature is designed to redefine the way you manage alerts.

Choosing the Right Metrics for Noiseless K8s Alerting

Watch Ankur Rawal and Dheeraj Reddy talk about how to choose the right metrics for noise K8s alerting, with insights and suggestions based on the mistakes made by hundreds of companies while implementing Prometheus Alertmanager in their production systems, and learn how much bad monitoring could be costing you. This talk was delivered at PromCon'2023 in Berlin.

What Is the Role of an Incident Commander?

For most businesses, managing major incidents can be intimidating. With a swarm of information coming from different directions, keeping things organized and maintaining clear, effective communication is tough. It only gets worse when there's no defined process to follow. This disorganization confuses everyone, delays responses, and increases the incident escalation rate. Enter the incident commander (IC).

Runbook vs. Playbook: Meaning, Differences, and Uses

It’s exhausting, right? Having to repeat instructions or answer the same questions whenever your incident response teams experience a problem. At first, it may have been exciting — it was fulfilling to answer these questions and help your teams solve minor security alerts. You were the hero! You went ahead and documented all this information. But as your company grew and your attention was needed in other areas, these questions and issues started to lengthen incident response time.

The new principles of incident alerting: it's time to evolve

In the ever-evolving world of software engineering, the landscape is constantly shifting. New technologies emerge, best practices evolve, and how we build and run software continues to change. However, when it comes to incident alerting, it often feels like we're stuck in the past.

Alternatives to SMS alerts

While SMS alerts are handy, they also tend to be tricky. Across 120+ countries, we continuously deal with compliances & regulations from Vendors, Government, and Phone carrier companies. Other alert channels similar to SMS are a lot less cumbersome with higher delivery rates. Let’s take a look at the available options to switch from SMS.