Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

AWS re:Invent '24: Generative AI Observability, Platform Engineering, and 99.9995% Availability

I attended Amazon Web Services re:Invent conference. This is AWS's annual user conference, which takes over most of Las Vegas for a week. There’s a lot to do and take in—customer stories galore, new tech, learning different use cases, and all the walking. But you’re here to hear what I learned, so I’ve broken it down into sections. Enjoy!

From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines

Last week, I attended one of the last conferences of the year with team Mezmo: the Gartner IT Infrastructure, Operations & Cloud Strategies Conference in Las Vegas. Not surprisingly, there were over 20 sessions covering observability and how it is getting increasingly critical in the new complex distributed computing environment. Of course, there were many sessions, including all keynotes that addressed the advent and impact of AI on IT operations and observability.

Our team's learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook

A few weeks ago, members of Mezmo were at Kubecon and attended several sessions. You can see a post with my recap and session highlights. Today, though, I’m going to discuss three sessions that my colleagues found interesting for our peers in Observability.

Webinar Recap: 2024 DORA Report: Accelerate State of DevOps

I had a fantastic opportunity to sit with Ben Good of Google and Rich Prillinger of Mezmo and participate in the discussion about the new DORA 2024 report. The 10th edition of the DORA report covers the impact of AI on software development, explores platform engineering’s promises and challenges, and emphasizes developer experience and stable priorities for success.

Key Takeaways from the 2024 DORA Report

Google recently released its 2024 Cloud DORA (DevOps Research and Assessment) report, bringing together a decade’s worth of trends, insights, and best practices on what drives high performance in software delivery across industries of all sizes. This year’s findings take a closer look at how DevOps teams can achieve greater resilience and efficiency by adopting AI, improving team well-being, and building powerful internal platforms. ‍

Webinar Recap | Telemetry Data Management: Tales from the Trenches

Managing telemetry data effectively is a serious challenge for today’s engineering teams. In our webinar, Telemetry Data Management: Tales from the Trenches, experts from Mezmo and DZone shared practical strategies for building robust telemetry pipelines that both streamline operations and turn raw data into a strategic asset.

Regex vs Search Terms - Finding What You Need In Your Logs

This is an updated version of an earlier blog post that now includes links to our documentation. Full-text searches are a marvel of modern computing. In less than a second, search engines can match a query against hundreds of millions of documents. In the early days of search engines, you often had to use specific search operators and terms to get accurate results.

What are SLOs/SLIs/SLAs?

You’ve likely noticed how some pizza places promise delivery in 30 minutes, or they’ll give you your money back. But what are they really promising? They’re setting a clear performance goal and backing it up with confidence. How do they measure their performance? They track how long each delivery takes. And why do they make this promise? Because fast service is key to keeping their business thriving.