Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Using AI to Auto-Detect and Remediate Incidents

Today, the number of possible failure modes in cloud and microservices applications are exploding, making it increasingly difficult to gain true observability and take the right action across IT environments. According to Lightstep’s Global Microservices Trends report, 91% of teams are using or have plans to use microservices, but 73% report it is harder to troubleshoot application performance problems due to greater complexity.

Taming the CMDB Beast is Finally Within Reach

Managing IT infrastructure today can feel like a game of Tetris. Operations staff are constantly managing the addition of new pieces, trying to quickly determine how to best position them while the clock is ticking before the next round drops. Ultimately, decisions made early on impact what comes later and vice versa.

Why Your Status Page Matters and How to Use It

When an outage hits your service, everybody starts talking. Your engineers are talking about what caused the problem, and how to fix it; your management is asking about when it’ll be fixed; and your customers are telling the world that they’re not happy. But there’s an even more important conversation you should be having: communicating with your users about the issue.

(Fish) Farm-to-Table Produce With PagerDuty

Most of us are familiar with the traditional farms that have existed since humans learned to sow and harvest crops—these farms have provided us with food for centuries. And for a long time, due to the lack of refrigeration and other technology, humans lived near their food sources. But industrialization has also led to centralization of farming systems, with farms getting larger and further from consumers and with distributors depending on preservatives or refrigeration to extend shelf life.

This is the Single Most Important Business KPI You Probably Aren't Even Monitoring

Having spoken with many companies, I’ve learned that while they all monitor their application performance, infrastructure, product usage, conversion rates and a variety of other user experience parameters, very few monitor the actual transactions from their payment provider.

Step-by-step guide to setting up Prometheus Alertmanager with Slack, PagerDuty, and Gmail

In my previous blog post, “How to Explore Prometheus with Easy ‘Hello World’ Projects”, I described three projects that I used to get a better sense of what Prometheus can do. In this post, I’d like to share how I got more familiar with Prometheus Alertmanager and how I set up alert notifications for Slack, PagerDuty, and Gmail.

Popular Mass Notification Solutions Used in Schools

OnPage BlastIT is a mass notification system that allows organizations to enhance their crisis communications. It streamlines communication in emergency situations, ensuring that critical, urgent alerts are never missed. Additionally, BlastIT allows organizations to improve mass messaging operations by 30- to-40 percent. Here, I’ll highlight BlastIT’s features and how they outweigh competitor functionalities.