Operations | Monitoring | ITSM | DevOps | Cloud

How to filter metrics by label?

It is sometimes easy to get lost in the mountain of metrics and infinite number of dimensions when working with an infrastructure monitoring tool. Being able to filter metrics by label and visualize only what is relevant to the current scope of monitoring & troubleshooting, becomes absolutely crucial to the success of SREs, Sysadmins and DevOps professionals.

So You've Troubleshooted the Alert. Now What?

Welcome to the companion post to So You Received an Alert. Now What? Last time, we broke down the process between receiving the Uptime.com check alert and figuring out what broke. Today, we’re going to show you how to communicate your efforts so that everyone – your end users, coworkers, and bosses – know what’s going on. Your first step is to update your Status Page, your central hub for incident management and communication.

Datadog on gRPC

Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability. As teams investigated incidents related to their services, they discovered that some of them were gRPC related. But, were there common patterns to those incidents? Could we use them to learn more about gRPC and how to use it better?

Tutorial: How to Use ChaosSearch with Grafana for Observability

In my last blog post, Building a Cost-Effective Full Observability Solution Around Open APIs and CNCF Projects, we introduced using ChaosSearch in combination with the most popular open source front- and back-ends in the application observability space. In case you missed it, the TL;DR version is that you can use a variety of open source projects and open API-based components to build the best-of-breed observability stack of your choice rather than relying on expensive, all-in-one solutions.

It's a Three-Peat For Cribl with Awards from Comparably

When we began the week, we had zero awards from Comparably. As we end the week, we now have a three-peat of awards. Cribl was recognized among 70,000 companies out of 15 million ratings – winning top honors for Happiest Employees, Best Compensation, and Best Perks and Benefits. We’re thrilled to be recognized by Comparably, and we’re looking forward to continuing our pursuit of being the best place to work.

Understanding the Different IT Security Certifications

Data security is more important than ever. High-profile cyber attacks in 2021, like the Colonial Pipeline Breach, caused major services to grind to a standstill. Ransomware is still on the rise, and there’s a fear that cybercriminals have the ability to break through 93% of company networks.

External Services Monitoring for Python

Python web applications are taking over more and more of the internet (source). However, with great Pythonic power comes great responsibility — ensuring that your web applications consistently deliver in terms of performance and reliability. It is one thing to build and ship an application and another to continually monitor and maintain it on the internet.

What is a Service Catalog in ITIL? 6 Tips to Nail it!

An IT service catalog is a one-stop shop to display all the services offered by an organization — and you can build it in just four steps. In a nutshell, it's a centralized database of active IT services to provide end-users with clear an accurate information on what the IT department offers. The ultimate goal of an ITIL service catalog is to simplify the process of making an IT service request. However, mastering this self-service option requires paying attention to several details.