Operations | Monitoring | ITSM | DevOps | Cloud

Building a Privacy-First AI for Incident Management

At Rootly, we're integrating AI into incident management with a keen eye on privacy. It's not just about tapping into AI's potential; it's about ensuring we respect and protect our customers’ privacy and sensitive data. Here's a quick overview of how we're blending innovation with strong privacy commitments.

How to Monitor Network Failover: Fighting Against Downtime

The Internet is everywhere these days, woven into how businesses operate and connect with customers, partners, and colleagues. It's not just a luxury; it's a necessity. Keeping things running smoothly means having a network that's on its A-game all the time – no glitches allowed. Why? Well, network downtime isn't just an inconvenience; it's like a money-eating monster that also affects how people see your company.

New MTTX analytics to drive your reliability roadmap

Analytics are great. We can all agree there. But not all analytics are created equal. FireHydrant has long offered incident analytics dashboards that provide an in-depth look at the entire incident lifecycle. You can see how incidents impact services and teams, understand retrospective participation and completion, and even get insight into follow-ups. But great analytics do more than simply organize data. They help you tell a story.

Safer Client-Side Instrumentation with Honeycomb's Ingest-Only API Keys

We're delighted to introduce our new Ingest API Keys, a significant step toward enabling all Honeycomb customers to manage their observability complexity simply, efficiently, and securely. Ingest Keys are currently available for Environment & Services customers, with Classic support and programmatic key management capabilities under development and coming soon!

Powering Real-Time Data Processing with InfluxDB and AWS Kinesis

Imagine a data engineer working for a large e-commerce company tasked with building a system that can process and analyze customer clickstream data in real-time. By leveraging Amazon Kinesis and InfluxDB, they can achieve this goal efficiently and effectively. So, how do we get from idea to finished solution? First, we need to understand the tools at hand.

Your Global Microsoft Teams Performance Action Plan

Our ‘Global Microsoft Teams Performance Trends’ report revealed some very interesting facts about enterprise usage of Teams. Using insight drawn from hundreds of thousands of Teams users, we’ve figured out the issues that plague certain regions. Don’t worry, we haven’t just filed that under “important stuff for later”, we’ve created an action plan for your organization you use to make its Teams performance much better. Let’s dive straight in.

What is an alert?

Terms like ‘alert’ play an important role in understanding IT and OT operations. There is usually an abundance of interpretations and definitions. You will also find different naming conventions with each vendor of tools for monitoring and service management. So, let’s dive in. How is an alert defined? Some define alerts as events that meet a certain thresh-hold, have a specific relevance (as in ITIL – events of warning/alert type) or require action.

What is an event?

Terms like ‘event’ play an important role in understanding IT and OT operations. There is usually an abundance of interpretations and definitions. You will also find different naming conventions with each vendor of tools for monitoring and service management. So, let’s dive in. How does ITIL (Information Technology Infrastructure Library) define an event? ITIL links events and notifications directly by saying.

The Real Cost of Synthetic User Testing with AWS

Every time I share a project using SaaS tools, someone inevitably responds that they could do the same thing on their own home server ‘for free.’ I mention this not because it is annoying, since I would never go on social media at all if annoying responses were allowed to change my behavior, but because I think it points to a basic misconception that still affects DevOps practitioners today: the refusal to accurately estimate the real costs of self-managed solutions.