Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

How to Use MySQL Performance Analyzer

If you're dealing with slow MySQL queries and wondering why your database performance is lagging, you're not alone. MySQL performance analyzers are key tools for pinpointing bottlenecks, optimizing queries, and ensuring your databases stay efficient and responsive. Let’s explore how these tools can help you keep things running smoothly.

Apache Cassandra Monitoring: Tools, Challenges & Best Practices

When your distributed database architecture scales to handle massive workloads, keeping tabs on everything becomes critical and complex. With its masterless architecture and linear scalability, Apache Cassandra powers mission-critical applications across industries—but without proper monitoring, you might as well be flying blind through a storm.

The New Rootly Ringtones: How Research-based On-Call Sounds

We set out to create a ringtone that wasn’t just loud—but the sound of a modern pager. Something that wakes you up, but without triggering a full-blown adrenaline spike. In this video, go behind the scenes with sound engineer Gorjão as he crafts a how research-based on-call sound sounds like.

GDPR Log Management: A Practical Guide for Engineers

GDPR compliance for logs can be tricky—especially when you're trying to maintain system visibility and protect user data at the same time. For SREs and IT teams, it’s a balancing act between staying on the right side of privacy laws and not losing the context you need to troubleshoot. This guide walks through practical ways to handle personal data in logs, set up retention rules that make sense, and stay compliant without creating unnecessary friction.

Why Reliability Starts with the Network, even in the AI era, with Marino Wijay

In this episode, we explore how networking has shaped reliability as we know it. Marino Wijay cloud networking expert and Staff Solutions Architect at Kong shares how his journey began not as an SRE, but with cables, routers, and switches. Marino explains the evolution of the fabric holding systems together through virtualization, and how software-defined networking, which is now a key element to resilient applications.

A Closer Look at Docker Build Logs for Troubleshooting

In the world of containerization, understanding what's happening under the hood during image builds can mean the difference between smooth deployments and frustrating debugging sessions. Docker build logs are your window into this process, offering crucial insights that help you optimize builds, troubleshoot errors, and maintain robust container infrastructure.

How to Connect ELK Stack with Grafana

In today’s distributed systems world, you need clear visibility into logs, metrics, and everything in between to keep systems healthy and reliable. That’s where the ELK Stack and Grafana work well together—each solving a different part of the observability puzzle. ELK handles the heavy lifting of log collection and processing. Grafana adds intuitive dashboards and powerful visualizations.

Everything You Need to Know to Start Monitoring Postgres

Keeping your Postgres databases healthy is non-negotiable if you care about application performance and reliability. But monitoring Postgres the right way? That’s where things get tricky. Between the sheer volume of metrics and the noise that comes with them, it’s not always obvious what to pay attention to—or when. This guide breaks things down with a focus on what matters in real-world production setups.