Operations | Monitoring | ITSM | DevOps | Cloud

We now support Google Chat

I'm pleased to share that we've can now notify you via Google Chat. Here's what that looks like: Our Google Chat notifications include: You can read more on how to set up Google Chat notifications in our docs. Of course, we also offer numerous other channels to notify you when something is wrong with your site. I'm pleased to share that we've can now notify you via Google Chat.

Introduction to Kafka Scaling Challenges

Apache Kafka has become the go-to platform for organizations handling high-throughput, real-time data streaming. Its ability to manage massive data volumes while ensuring reliability is second to none. However, as businesses grow and demand for data increases, scaling Kafka isn’t always a walk in the park. It often comes with its own set of challenges that can throw even the most seasoned teams for a loop.

Liquid Cooling vs. Air Cooling: What's Right For Your Data Center?

As power-hungry workloads like AI and HPC become the norm, data centers face mounting pressure to rethink their thermal strategies. Traditional air cooling has long been the industry standard, but with rising rack densities and energy costs, many operators are exploring liquid cooling as a more efficient alternative. In 2024, the global liquid cooling market was valued around $4.18 billion and is projected to reach $13.2 billion by 2029.

Observability in under 5 seconds: Reflecting on a year of grafana/otel-lgtm

With grafana/otel-lgtm, observability is just one Docker command away. Over the past year, grafana/otel-lgtm has simplified observability setups, helping developers get a complete OpenTelemetry stack running in under five seconds. With integrations for metrics, logs, traces, and now profiles via Grafana Pyroscope, it has become a go-to solution for demos, development, and testing, as evidenced by its growing community (1k stars on GitHub and growing!) and notable adopters.

How a Fortune 500 Company Eliminated 93% of IT Incidents in 72 Hours

Sometimes the biggest transformations begin with what sounds like the worst possible news. One day, this Fortune 500 technology company’s observability platform was running smoothly. The next, they learned their critical monitoring solution would be discontinued as part of a corporate buyout. For a leading global IT vendor in data infrastructure serving customers across storage, cloud, and managed services, this was a potential catastrophe.

An open-source SDK for finding dead code

Writing code is easier than ever. We want to make deleting code just as easy – introducing Reaper for iOS and Android. Reaper was an Emerge Tools product that helped companies like Duolingo delete 1% of their iOS codebase. And just like with Emerge Tools’ Launch Booster, we’re making Reaper open-source for anyone to use. In this post, we’ll explain what Reaper is, why you should care about dead code, and how Reaper works on both platforms.

How Replicas Work in Kubernetes

Replicas in Kubernetes control how many copies of your pods run simultaneously. They're the foundation of scaling, availability, and recovery in your cluster. When you're running a stateless API or a background worker, understanding how replicas work directly impacts your application's reliability and performance. This blog walks through replica management, from basic concepts to production monitoring patterns that help you maintain healthy, scalable applications.

Improve Consistency Across Signals with OTel Semantic Conventions

It’s 2 AM. Your API is timing out. Logs show a slow query. Metrics flag a spike in DB connections. Traces reveal a 5-second delay on a database call. But then the questions start:- Which database?- Does the query match the delay?- Why doesn’t this align with the connection pool metrics? Each tool uses different labels, db.name, database, sometimes nothing at all. Without a shared schema, connecting the dots is slow and frustrating.

Celebrating our Top Tech Award win with Back Market

We are proud to share that Aiven has been awarded the Top Tech Award in the “Transformation and Cloud” category by L’Informaticien, one of France’s leading IT publications. The honour comes as part of the 2025 Top Tech Awards, a celebration of standout achievements in digital innovation, transformation, and cloud excellence. This award is a major milestone.

6 OpsGenie Alternatives for On-Call Management

You’re likely here because you heard the news: Atlassian ended new sales for OpsGenie on June 4, 2025, with a complete shutdown scheduled for April 2027. For years, OpsGenie has been the backbone of on-call management for countless teams. It might have been your team’s trusted solution too. But now, that chapter is closing. The pressure to find an OpsGenie alternative for on-call is real. However, you can’t just pick any tool and hope it works for your team.