Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Making the most of alerting in Opsgenie

For customer-facing SaaS companies, setting up an alerting tool is a no-brainer. In the current climate of always-on services, companies need assurance that customers are getting the service they demand and expect – all the time. But many organizations still struggle to notify the right people at the right time. If your data center is on fire and you alert Karen while she’s vacationing in the Greek Isles, you (and poor Karen) have a problem.

Announcing our Statuspage.io integration

Ever go to a status page and it says everything is operational when it definitely isn’t? You refresh maddeningly thinking it might be you. You ponder if the bill for the internet has been paid. Then, as a last resort, you check Twitter only to discover hundreds of people are experiencing the same problem. This is common, and because of it, we’re happy to release out integration with Statuspage.io!

Healthcare IT Trends and Challenges

Technology and digitization are disrupting every industry—and healthcare is no exception. In this time-critical industry, patient care needs to be efficient and convenient. This is increasingly evidenced by the rise of individualized healthcare via direct-to-consumer (D2C) and convenience care models such as telemedicine to find doctors, pay bills, schedule appointments, order prescription refills, receive consultations, and more.

Introducing the incident communication template generator

When things go wrong, your users need to know – but it’s not always easy to determine what to say or how to say it. If you’re responsible for getting the word out to hundreds or thousands of users, it can feel like a heavy weight on your shoulders. The task at hand is urgent, yet must be handled delicately. As someone who’s handled incident communication on Statuspage’s status page – the mother of all status pages – I know how difficult these moments are.

Understanding Systemic Issues: The PagerDuty Health Check Process

Continuous improvement is one of the fundamental tenets of Agile methodology that PagerDuty’s product development teams emphasize. This already works fairly well at the individual team level via retrospective meetings and postmortems but sometimes we don’t notice larger or systemic issues that are outside the control of a single team. This blog will share the process that we use at PagerDuty to uncover those issues, the outcomes we have seen, and how we have evolved that process.

August 2019 Update: Mobile Alert Dashboard and PSD2 Support

Our August update makes SIGNL4 fit for the new “Payment Services Directive 2”. In addition, we have added extended the mobile alert dashboard and added new metrics. The enhanced dashboard of the SIGNL4 mobile app now shows alert counters per ‘services & systems’ category. Here come the details….

Introducing a detailed History and Resend capabilities for Emergency Callouts

Emergency callouts are some of the most important notifications a user can receive. With the optional ‘Emergency Callout’ add-on, the capability to reliably alert and notify large numbers of employees can become part of your Enterprise Alert installation. These callouts can tell users about dangerous situations such as inclement weather, fires in the building, or even security issues like active shooter in the building. It is imperative that users receive these notifications.

Optimizing Business Response When Technical Incidents Happen

Most technical incident response plans typically account for stakeholder communications—for both internal teams and external customers. But at PagerDuty, what we’ve learned from our customers is that there’s still a painful and expensive gap in alignment between IT and business teams. To close that gap, we need to focus on what incident response means for business teams.