Operations | Monitoring | ITSM | DevOps | Cloud

%term

A Day in the Life of a Mezmo SRE

What keeps an SRE at the top of his game? I had an insightful conversation with Jon Duarte, a Site Reliability Engineer (SRE) at Mezmo and he walked me through his role and the various tasks he manages on a typical day. Here’s Jon offering a brief glimpse into the challenges he faces, the thought processes behind his approach, and the innovative solutions SREs come up with.

New GenAI Search Revamps Customer Experience

Splunk has launched a GenAI summary feature in splunk.com and docs.splunk.com search platforms designed to give users a quick and accurate glance of the most pertinent information they are looking for. This GenAI feature serves up a contextual high-level summary pulled from various relevant search results on topics ranging from Splunk product and feature usage to general Splunk terminology.

Achieving SLO Success with Golden Signals and Reliability Testing

The four Golden Signals are an easy and effective way to measure the most important aspects of a system, and when paired with a reliability management platform like Gremlin, they help you proactively meet your SLOs so you can meet your legal obligations and deliver the perfect customer experience.

AI at the Peak of Inflated Expectations? A Reality Check

The AI hype is undeniable. Buzzwords like ‘machine learning’, ‘deep learning’, and ‘artificial intelligence’ have permeated boardrooms, media, and tech conferences. However, recent market movements suggest that AI might be at the ‘peak of inflated expectations’. Nvidia, a leading player in AI hardware, has seen its stock plummet by about 20% over the last month (8th July to 8th August 2024).

Grafana 11.2 release: new updates for data sources, visualizations, transformations, and more

The Grafana 11.2 release ushers in a new wave of Grafana data sources, updates to visualizations and transformations, and more capabilities in Grafana Alerting as well as authorization and authentication. Plus, for those who are looking to move from on-premises to cloud, there is a new migration assistant for Grafana Cloud in public preview. Grafana 11.2: download now! For even more details about all the changes in this release, refer to the changelog or the What’s New documentation.

On-Call Rotations and Schedules: A Guide for 2024

In an increasingly connected world where businesses operate around the clock, the importance of having an effective on-call system cannot be stressed enough. With technological advances and the expectation of immediate attention to business-critical issues, creating a reliable on-call rotation and schedule is essential for ensuring operational continuity. This comprehensive guide will walk you through the various aspects of on-call rotations and schedules that you need to consider for 2024.

Customer Survey 2024: Unveiling insights and impact

We’re delighted to share the results of our 2024 Annual Customer Survey. Participants from some of the world’s most innovative companies shared their insights and experiences, highlighting our growing impact, impressive ROI, increased customer satisfaction, and broad adoption across various teams. Learn the key trends from the survey and how Catchpoint ensures Internet Resilience for some of the world’s most innovative companies.

Reduce SNMPv3 Trap Volume With Cribl Lookups

Despite new technologies and telemetry formats, like Model-driven Telemetry/Streaming Telemetry and OpenTelemetry, SNMP traps continue to be a significant source of events for monitoring teams. If you’ve been in IT operations, you’ve likely had a request to parse SNMP traps into a human-readable format so that they can be analyzed, probably deduplicated, and passed to a ticketing system for triage and remediation. The challenge? SNMP traps can be excessively chatty.

How to Choose Workflow Management Software for Your Business

Every team faces challenges in its daily operations that affect its operational efficiency, preventing it from hitting its weekly, monthly, and quarterly KPIs. It could be miscommunication, repetitive tasks on their to-do list requiring manual input, or a disjointed team. Fortunately, a good workflow management tool can help you streamline all your business tasks, improve team connectivity, and reduce operational errors.

Common Kafka Security Pitfalls and How to Avoid Them

You ever get that nagging feeling that maybe, just maybe, you’ve missed something crucial in a project? When it comes to deploying Apache Kafka, that “something” often turns out to be security. I’ve been there myself, thinking everything was running smoothly, only to realize later that I’d left the door wide open for potential security issues. Kafka is powerful, but it’s easy to overlook some key security measures if you’re not careful.