Operations | Monitoring | ITSM | DevOps | Cloud

%term

Achieving SLO Success with Golden Signals and Reliability Testing

The four Golden Signals are an easy and effective way to measure the most important aspects of a system, and when paired with a reliability management platform like Gremlin, they help you proactively meet your SLOs so you can meet your legal obligations and deliver the perfect customer experience.

AI at the Peak of Inflated Expectations? A Reality Check

The AI hype is undeniable. Buzzwords like ‘machine learning’, ‘deep learning’, and ‘artificial intelligence’ have permeated boardrooms, media, and tech conferences. However, recent market movements suggest that AI might be at the ‘peak of inflated expectations’. Nvidia, a leading player in AI hardware, has seen its stock plummet by about 20% over the last month (8th July to 8th August 2024).

Grafana 11.2 release: new updates for data sources, visualizations, transformations, and more

The Grafana 11.2 release ushers in a new wave of Grafana data sources, updates to visualizations and transformations, and more capabilities in Grafana Alerting as well as authorization and authentication. Plus, for those who are looking to move from on-premises to cloud, there is a new migration assistant for Grafana Cloud in public preview. Grafana 11.2: download now! For even more details about all the changes in this release, refer to the changelog or the What’s New documentation.

On-Call Rotations and Schedules: A Guide for 2024

In an increasingly connected world where businesses operate around the clock, the importance of having an effective on-call system cannot be stressed enough. With technological advances and the expectation of immediate attention to business-critical issues, creating a reliable on-call rotation and schedule is essential for ensuring operational continuity. This comprehensive guide will walk you through the various aspects of on-call rotations and schedules that you need to consider for 2024.

Customer Survey 2024: Unveiling insights and impact

We’re delighted to share the results of our 2024 Annual Customer Survey. Participants from some of the world’s most innovative companies shared their insights and experiences, highlighting our growing impact, impressive ROI, increased customer satisfaction, and broad adoption across various teams. Learn the key trends from the survey and how Catchpoint ensures Internet Resilience for some of the world’s most innovative companies.

Reduce SNMPv3 Trap Volume With Cribl Lookups

Despite new technologies and telemetry formats, like Model-driven Telemetry/Streaming Telemetry and OpenTelemetry, SNMP traps continue to be a significant source of events for monitoring teams. If you’ve been in IT operations, you’ve likely had a request to parse SNMP traps into a human-readable format so that they can be analyzed, probably deduplicated, and passed to a ticketing system for triage and remediation. The challenge? SNMP traps can be excessively chatty.

How to Choose Workflow Management Software for Your Business

Every team faces challenges in its daily operations that affect its operational efficiency, preventing it from hitting its weekly, monthly, and quarterly KPIs. It could be miscommunication, repetitive tasks on their to-do list requiring manual input, or a disjointed team. Fortunately, a good workflow management tool can help you streamline all your business tasks, improve team connectivity, and reduce operational errors.

Common Kafka Security Pitfalls and How to Avoid Them

You ever get that nagging feeling that maybe, just maybe, you’ve missed something crucial in a project? When it comes to deploying Apache Kafka, that “something” often turns out to be security. I’ve been there myself, thinking everything was running smoothly, only to realize later that I’d left the door wide open for potential security issues. Kafka is powerful, but it’s easy to overlook some key security measures if you’re not careful.

Evolving solutions for IT operations teams

ITOps teams face several common issues, from high noise and incident volumes to siloed teams and manual workflows. These challenges contribute to reduced operational efficiency, extended downtimes, and lost revenue. All things you want to avoid. You rely heavily on incident response teams to keep your part of the digital world running smoothly. The BigPanda platform helps ITOps and incident response teams accelerate and automate incident detection, investigation, and resolution.

10 Incident Management Metrics to Monitor and Improve Your Service

In the world of IT Service Management, the ability to effectively manage incidents is crucial to maintaining business continuity and customer satisfaction. That's why it's always a good idea to track Incident Management metrics from the start. We all know that incidents, ranging from minor service disruptions to major outages, can have significant impacts on an organization's operations and reputation.