Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Building a Distributed Security Team

In this live stream, Cjapi’s James Curtis joins me to discuss the challenges of building a distributed global security team. Watch the full video or read on to learn about some hard-won examples of how to be successful with remote team building and management. Talent is hard to find, and companies are hiring from all over the world to build the best teams possible, but this trend has a price.

Machine Learning for Fast and Accurate Root Cause Analysis

Machine Learning (ML) for Root Cause Analysis (RCA) is the state-of-the-art application of algorithms and statistical models to identify the underlying reasons for issues within a system or process. Rather than relying solely on human intervention or time-consuming manual investigations, ML automates and enhances the process of identifying the root cause.

Grafana 10.1: TraceQL query results streaming

Tempo offers amazing performance, but there are still cases where TraceQL queries take a long time to return results. This could be due to a multitude of reasons from the complexity of the query, amount of choices stored, or the timeframe selected. See how to navigate your query results more quickly, with query results streaming, available as an experimental feature in Grafana version 10.1.

Find Trending Problems Faster with Escalating Issues

Knowing what issues to hit the snooze button on, or drop everything and push a hotfix for is a common developer dilemma. Similarly to what was discussed in Sleep More; Triage Faster with Sentry, we’ve been collecting and iterating on customer feedback for ways to reduce issue noise and surface high-priority issues faster.

How to create an alert rule in Grafana 10.1

You may have built an alert rule with Grafana Alerting and then grappled with routing, reconfiguring, and managing the different alerts your team set up. To address this challenge, we’ve implemented a series of improvements to set up and maintain alert rules in Grafana. Watch how the new alerting workflow works.

Gateways and BindPlane

The BindPlane Agent is a flexible tool that can be run as an agent, an aggregator, or both. As an agent the collector will be running on the same host it's collecting telemetry from, while an aggregator will collect telemetry from other agents and forward the data on to their final destination. Here are a few of the reasons you might want to consider inserting Aggregators into your pipelines: Today we will examine these reasons, and some possible architectures for implementing aggregators.