Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How to Control Alert Fatigue?

Alerts are indispensable to any IT operations system today. Site reliability engineers (SREs) or ITOps executives set up several monitoring tools for their IT landscape. When there is a change, high-risk action, or outage in any of these incidents, the monitoring tool triggers an automated alert. This could happen on the monitoring tool’s dashboard itself, via email, or enterprise collaboration tools like Slack or Teams.

The Five Data Pillars of Effective Root-Cause Analysis

The most effective way to understand an incident, resolve it and prevent it from occurring again is root-cause analysis. Simply put, root-cause analysis is the study performed by ITOps teams or site reliability engineers (SREs) to pinpoint the exact element/error that caused the unexpected behavior. Based on this, they plan remediation. Accurate and timely root-cause analysis can have a direct impact on the company’s top and bottom line.

Spotting and Avoiding Database Drift

Managing any database ecosystem is difficult enough: taking backups, maintaining statistics, and doing performance tuning all tax the time of the DBA or database developer. The job is complex even without considering the work you do to manage the various schema and data drifts that can occur. Unless you operate in a vacuum or within a single person organization (and even then, schema drift can occur), drift is going to manifest naturally and as the size of the environment expands.

Metrics now generally available in Honeycomb

Starting today, Honeycomb Metrics is now generally available to all Enterprise customers. You’ve adopted our event-based observability practices, in part to overcome the debugging roadblocks you hit when using custom metrics to identify application issues. But metrics do still provide value at the systems level. Now, you can easily see and use your metrics data alongside your event data in Honeycomb—all in one interface.

Tracing AWS Lambdas with OpenTelemetry and Elastic Observability

Open Telemetry represents an effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries. Recently, OpenTelemetry became a CNCF incubating project, but it already enjoys quite a significant community and vendor support. OpenTelemetry defines itself as “an observability framework for cloud-native software”, although it should be able to cover more than what we know as “cloud-native software”.

Painting a Complete Network Monitoring Picture: Why Context is Critical

In order to produce their masterpieces, artists like van Gough, Rembrandt, Picasso, and Monet painted with more than just one color. Being able to choose from multiple colors (not to mention an abundance of talent, inspiration, and creativity) is what allowed these artists to see their complete vision come to life on canvas. However, if you’re relying on a single set of data to troubleshoot network issues, it’s like you’re stuck painting with one color.

5 Best IT Experience Practices Your Team Can Make Today

If you were to put 100 enterprise tech leaders in a room together and ask them if they think their company’s employee experience is dependent upon IT, I’m certain all would agree it is. But I’m also certain those 100 wouldn’t know: For IT decision-makers, the devil is in the details. Many are judged by uncompromising Service Level Agreements (SLAs) and shoddy survey data, not comprehensive digital experience trends and indexes.

CheckMK and Enterprise Alert - a scripted heartbeat check

A few days ago I received an inquiry about a scripting problem from one of our longtime partners, to be exact our DCP Marc Handel from IT unlimited AG. In the exchange with Marc I realized that his idea to use the Enterprise Alert Scripting Host, the Windows Task Scheduler and CheckMK to realize a roundtrip monitoring could be interesting for the whole community. Especially for all our CheckMK customers.