Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

When DNS Says: Talk To The Hand!

When DNS Says: Talk to the Hand! What? This started with a post on social media, which created a discussion among us industry professionals. The following conversation happened when I got to talk to my coworkers about some interesting things regarding DNS responses. Putting us gearheads in a room always results in an interesting comment or two!

Advanced Kafka Performance Tuning for Large Clusters

Kafka is a beast when it comes to handling data streams at scale. But when your Kafka setup grows into a massive cluster, keeping it running smooth? Yeah, that can feel like trying to tame a tornado. Imagine hundreds, maybe thousands, of brokers, topics, and partitions—all moving data at lightning speed. The moment one thing slows down, you’re staring at bottlenecks that could trip up your whole system. It’s not pretty.

Put Your Issue Detection and Response on Fast-Forward With GenAI

Most engineers will tell you this: Troubleshooting today feels like trying to find your way out of a wild jungle, in the middle of a storm, at night, while a countdown clock is running. In other words, it’s ambiguous, nerve-racking, and plain difficult. But should this be the norm?

What's Chaos Monkey? Its Role in Modern Testing

Chaos Monkey is an open-source tool. Its primary use is to check system reliability against random instance failures. Chaos Monkey follows the testing concept of chaos engineering, which prepares networked systems for resilience against random and unpredictable chaotic conditions. Let’s take a deeper look.

It's time to stop neglecting the elephant in the room: Performance Matters!

Ralph Marsten once said, “Don't lower your expectations to meet your performance. Raise your level of performance to meet your expectations.” Many organizations today seem to follow the opposite. If everything looks green on a dashboard, they assume all is well. But is it?

Deploying InfluxDB and Telegraf to Monitor Kubernetes

I run a small Kubernetes cluster at home, which I originally set up as somewhere to experiment. Because it started as a playground, I never bothered to set up monitoring. However, as time passed, I’ve ended up dropping more production-esque workloads onto it, so I decided I should probably put some observability in place. Not having visibility into the cluster was actually a little odd, considering that even my fish tank can page me.

Top 11 Grafana Alternatives [comparison 2024]

Grafana is a widely used open-source platform for monitoring and visualization. Grafana has a lot of built-in functionality and also provides a large amount of community templates that can improve your overall experience. However, Grafana requires quite a lot of configuration and the documentation can be a bit overwhelming for beginners. In this article, we explore seven alternatives that can be simpler to use and can provide seamless integration of traces, logs, and metrics.

An Ode to Events

At this point, it’s almost passé to write a blog post comparing events to the three pillars. Nobody really wants to give up their position. Regardless, I’m going to talk about how great events are and use some analogies to try to get that across. Maybe these will help folks learn to really appreciate them and to depreciate a certain understanding of the three pillars. Or maybe not.