Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Monitoring Social Signals to Reduce Alert Fatigue With SignalFx and PagerDuty

“I need to be notified if there’s a significant event ongoing with SignalFx.” This is what I tell my team. However, despite being the CTO of a monitoring company, creating the right set of alerts for me to stay informed of incidents in progress or potential issues was harder than it seemed at first glance. Why?

Massachusetts Natural Gas Explosions - A Lesson in The Importance of Alert Automation

The pressure in the natural gas pipelines under three Massachusetts communities spiked to 12 times their normal level last week, just before the explosions and fires that destroyed dozens of homes and killed an 18-year-old man. Columbia Gas went under fire for their mismanagement of the incident. The NTSB says a Columbia Gas control room in Columbus, Ohio, registered pressures of 6 pounds per square inch last Thursday in pipelines that are intended to carry just 0.5 PSI.

Saving lives by ensuring uptime of mission-critical IT at Gift of Hope

Gift of Hope Organ & Tissue Donor Network is a non-profit organ procurement organization that coordinates organ and tissue donation and provides public education on donation in Illinois and northwest Indiana. As one of 58 OPOs that make up the nation’s donation system, Gift of Hope works with 180 hospitals and serves 12 million people in their donation service area.

Alert fatigue, part 2: alert reduction with Sensu filters & token substitution

In my previous post, I talked about the real costs of alert fatigue — the toll it can take on your engineers as well as your business — and some suggestions for rethinking alerting. In part 2 of this series, I’ll share some best practices for fine-tuning Sensu to help reduce alert fatigue.

Connect Insights to Real-Time Action With PagerDuty Visibility

Have you ever gotten that dreaded text from your boss: “The site is down”? Maybe you were meeting with a customer. Or having dinner with your family. Maybe you were presenting at a conference. Doesn’t matter. Whatever else you were doing, now you’re doing emergency incident communication too. You check in with your team leads and confirm there is a problem. You let your boss know the response is under way.

How AI/ML Helps Retailers Keep 3 Promises This Holiday Season?

Another holiday season will soon be upon us, and many retailers and eCommerce businesses are already making plans. As you take inventory of what you learned last holiday season, let’s start with some lessons learned by the entire retail industry this time last year. In addition to stocking up on hot items and planning your promotions, the most competitive sites found that using AI/ML to optimize customer experience not only kept customers happy, it dramatically increased their revenues.

Alert fatigue, part 1: avoidance and course correction

Alert fatigue occurs when one is exposed to a large number of frequent alarms (alerts) and consequently becomes desensitized to them. This problem is not specific to technology fields: most jobs that require on-call, such as doctors, experience it in slightly different manners, but the problem is the same.