Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Monitoring and incident management: a winning combination

Monitoring systems gather and log a wide range of performance data on a diverse range of targets—from applications to user experience, networks, servers, and more. Usually, monitoring is conducted under runtime conditions, but synthetic monitoring can also be used to simulate loads and test the resilience of web services, for example.

Connect the Right Teams and Work Together to Quickly Resolve Customer Issues

Today at PagerDuty Summit 2019, we announced PagerDuty for Customer Service—a powerful new way to connect Customer Service teams to engineering and IT teams. We were also excited to debut two new partner integrations with Zendesk and Salesforce Service Cloud, and we can’t wait to show users how PagerDuty and our customer service ecosystem partners help connect the right teams so they can work together and resolve issues quickly to reduce customer impact.

Summit Day One: Delivering New Machine Learning Capabilities to Cut Costs and Outages

At PagerDuty, we continually innovate every month (check out our What’s New page for the latest updates). But while we ship product continuously, we also save a plethora of new and improved capabilities to share with our customers at PagerDuty Summit, our annual customer event.

Announcing General Availability of PagerDuty's Slack Integration

When PagerDuty’s VP of Product Management Rachel Obstler announced the beta version of our new Slack integration in April in her “Anticipating, Monitoring, and Managing Incidents via Slack” panel at Slack Frontiers, we expected significant interest in the integration among our customers.

Open Source can be a silver bullet, but your application might be a werewolf

I was reminiscing about an incident that happened at a past job with an old co-worker. You know the one, the one where you installed a library that makes some task of yours simple, only to reveal the library makes things worse. This incident in particular involved the way that images served out of our Ruby on Rails application, and the library that made it possible to “easily resize before serving” them.

Service-Based vs. Team-Based Approach: Which Is Better?

How is the incident response process set up at your organization? At PagerDuty, our approach is to holistically look at your infrastructure, your customer-facing applications, and your products. We distinguish these by describing these items as “services” that roll up to and make up a “business service.” This setup allows teams to better manage these services so that when incidents do happen, responders can gain context much faster. But how?