Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Website Monitoring for Black Friday and Cyber Monday: Best Practices

As Black Friday and Cyber Monday approach, eCommerce websites brace themselves for the year’s highest traffic. These retail-heavy events are prime opportunities for businesses to maximize their sales, but they also bring intense pressure on websites to perform at their peak. When it comes to online shopping, even a few seconds of delay or downtime can lead to frustrated customers, abandoned carts, and lost revenue.

Easily control observability collectors at scale with Fleet Management in Grafana Cloud

Managing observability workloads can quickly overwhelm even the most experienced admin. Maybe you’re dealing with multiple departments, each needing its own collector configurations and pipelines. Every time you have to run a test or roll out a change, the process is cumbersome and introduces risk. Or perhaps you’re responsible for tracking hundreds of collectors across different environments and regions. In a scenario like this, troubleshooting individual issues feels nearly impossible.

Deploying Prometheus With Docker

There are different ways you can use to deploy the Prometheus monitoring tool in your environment. One of the fastest ways to get started is to deploy it as a Docker container. This guide shows you how to quickly set up a minimal Prometheus on your laptop. You can then extend that setup to add a monitoring dashboard, alerting, and authentication.

Are ChatGPT or Claude better than Playwright Codegen?

I'm a bit of an AI skeptic. And even though GitHub Copilot is my daily auto-completion on steroids, I always double-check the code generated by LLMs. If you're using AI for coding, you probably know that the results are sometimes surprisingly good and other times shockingly terrible. Lately, I have seen more and more articles and even docs recommending ChatGPT to generate Playwright tests. Could this be true? Are ChatGPT and friends really that good at generating test code?

How DX NetOps Fuels Rapid, Accurate Isolation in Modern Networks

Businesses, like pretty much all of us, continue to grow ever more reliant upon network connectivity. When that connectivity falters, it can be extremely disruptive—and very costly. According to a report in PCMag, an internet shutdown of one minute can cost a business like Amazon almost $978,000 in revenue losses. For Alphabet, the number is $538,000.

Mastering Tail Sampling for OpenTelemetry: Cost-Effective Strategies with Cribl

Recently, I have seen a trend of enterprises moving toward OpenTelemetry (OTel) for application tracing. Tail sampling, in particular, has emerged as a preferred approach to gain actionable insights while balancing data volume and cost. OpenTelemetry offers developers and practitioners the ability to instrument their code with open-source tools, moving away from vendor-provided tools for application instrumentation.

Leveling up your observability practice - Part 1

Lessons from the front lines: Moving to observability maturity What separates the observability experts from the novices? It's a question that's been on my mind lately, especially after diving into our recent 2024 State of Observability Survey of over 500 practitioners. In my past roles as a DevOps engineer and a site reliability engineer (SRE), I've seen firsthand how a mature observability practice can be the difference between sleepless nights and smooth sailing.