Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

AI SRE in Practice: Enabling Non-Experts to Troubleshoot Kubernetes

Kubernetes troubleshooting traditionally requires deep platform expertise. Understanding pod lifecycle, decoding error messages, correlating events across resources, and identifying root cause all demand experience that takes years to build. This expertise gap creates a bottleneck where only senior engineers can handle production issues, limiting how quickly teams can resolve incidents.

AI SRE in Practice: Accelerating Engineer Onboarding with Contextual Expertise

Onboarding new engineers to complex Kubernetes environments is expensive. Junior engineers need to learn cluster architecture, understand organizational conventions, navigate internal documentation, and build relationships with senior team members who can answer questions. The process takes weeks or months, and during that time, senior engineers spend significant time mentoring instead of working on complex problems.

Database Partitioning: Types, Strategies, and When to Use Each

How database partitioning works in PostgreSQL and MySQL. Range, list, and hash partitioning with SQL examples and guidance on when to partition vs shard. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Database Sharding: How It Works and When You Actually Need It

How database sharding works, common strategies (hash, range, directory), shard key selection, and the operational cost of running a sharded database in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.