Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Latest Developments in Monitoring and Observability, 2023

You know it’s going to be a great day when you find yourself mentioned as a Sample Vendor on the Gartner® Hype Cycle™ report for Monitoring and Observability, 2023(July 2023). The OnPage team is thrilled to share with its community that we have been mentioned as a Sample Vendor by Gartner on their latest Hype Cycle for Monitoring and Observability. OnPage is recognized as a Sample Vendor, specifically within the Automated Incident Response category.

210% ROI: unlocking the economic value of FireHydrant for incident management

In the fast-paced high-tech industry, efficient incident management is a critical factor in maintaining brand reputation, employee morale, and most importantly, your bottom line. Good practices can result in reduced downtime, increased learning opportunities from incidents, and an enhanced reputation among both the engineering community and customers. But quantifying the true cost of incidents has always been a challenge — until now.

Datadog and BigPanda: Observability and AIOps made better together

Datadog’s modern observability empowers development engineers with full-stack visibility, comprehensive instrumentation generation, and proactive alerts to accelerate software development releases and address potential incidents. While Datadog gives teams end-to-end visibility, it works even better together with AIOps from BigPanda – development teams gain insights into outside application dependencies and reliance on other systems.

10 Years of Failure Friday at PagerDuty: Fostering Resilience, Learning and Reliability

In today’s fast-paced and ever-evolving world of technology, failure is inevitable. Organizations should embrace failure as a learning opportunity for how to build and deliver more resilient services. At PagerDuty, we’ve practiced Failure Friday for 10 years now. Failure Friday–a practice inspired by the chaos engineering space–involves intentionally injecting failures into our systems to improve reliability and foster a proactive engineering culture.

The Unplanned Show, Episode 6: Defining AIOps with Heather Newburn

“AIOps” is a term some love to hate, but what makes it useful? In this episode, Heath Newburn breaks down the three things to look for in an AIOps solution: reduce noise, create context, and reduce toil. He also explains the challenges with domain-specific approaches, versus domain-agnostic approaches to AIOps. But even within that approach, Heath warns of “gotchas” in rules “tech debt”, data formats, and overall long implementation times.

We used GPT-4 during a hackathon-here's what we learned

We recently ran our first hackathon in quite some time. Over two days, our team collaborated in groups on various topics. By the end of it, we had 12 demos to share with the rest of the team. These ranged from improvements in debugging HTTP request responses to the delightful “automatic swag sharer.” Within our groups, a number of us tried integrating with OpenAI’s GPT to see what smarts we could bring to our product.

How summertime turns up the heat on cyber readiness (and what to do about it)

“Malicious cyber actors aren’t making the same holiday plans as you.” (CISA & FBI) Summertime is prime time for cyberattacks. According to one survey, 58% of security professionals believe that there is seasonality in the attacks that their company experiences every year, with the majority citing summer as high season for breaches.

In review: Gartner Hype Cycle for ITSM

The OnPage team is pleased to inform that we’ve been included in Gartner’s ® latest Hype Cycle for ITSM, 2023 report, listing OnPage as a sample vendor in the Automated Incident Response category. For those unfamiliar with it, Gartner’s Hype Cycle for IT Service Management (ITSM) highlights tools and technologies that shape the ITSM ecosystem.

What's New in PagerDuty iOS and Android Mobile Applications

The PagerDuty Operations Cloud is your platform for action in critical moments. By harnessing the capabilities of AI and automation, it has the ability to detect and diagnose disruptive incidents, assemble the appropriate team members for prompt response, and optimize your digital operations by streamlining infrastructure and workflows.

Limitless Status Page Customization - Unlocked

Maintaining a comprehensive and engaging status page is the cornerstone of an effective incident communication strategy, yet too many companies limit themselves in this respect. Some rely on an assortment of disjointed application monitoring and manual incident notifications, while others look to the cheapest status page they can find.