Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

New OnPage + ConnectWise Incident Alerting Workflow

OnPage has combined the power of voicemail transcription with keyword-based triggers to identify and prioritize after-hours incidents. The new OnPage + ConnectWise workflow enhances incident alert management for IT and Managed IT clients by drastically decreasing incident response times. By streamlining after-hours on-call communication, OnPage's critical alerting platform has revolutionized the on-call IT industry.

Rootly Raises $12 Million from Renegade Partners, Google Gradient Ventures, & XYZ Ventures

We are excited to announce that we have raised a $12M round of financing led by Renegade Partners with participation from Google Gradient Ventures (Google’s AI-focused venture fund) and XYZ Ventures. This brings our total funding to date to $15.2M ($20M CAD) alongside our other existing investors Y Combinator and 8VC.

July 2023 newsletter: Changelog-The Deluxe Edition

🎵 Gotta give the people, give the people what they want! 🎵 You've been asking. And we've been listening. Over the past few weeks, we've been shipping frequently requested features to help you bring your incident management to the next level. It may be the dog days of summer, but let's ignore that, yeah? Just take a look at this recent changelog. Note that this is the biggest one we've ever published.

From On-call to Non-call: Resolving Incidents Before They Even Happen

Artificial intelligence has captured the attention of the world, with tools like ChatGPT and large language models (LLMs) driving the conversation. But you don’t need to wait for the future or new features powered by LLMs to start working smarter—the tech industry has been investing in intelligent, automated tools for years and they’re ready for production now. In this talk, you’ll learn how the engineering teams at Toyota Connected use tools like Datadog Watchdog, Anomaly Detection, and Workflows to make our lives easier and keep our platform stable.

Tools and Trends in Site Reliability Engineering according to Gartner's 2023 Hype Cycle

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, report. This blog reviews the future of site reliability engineering based on Gartner’s Hype Cycle. Additionally, the OnPage team is pleased that Gartner mentioned OnPage as a sample vendor in the Automated Incident Response category.

BigPanda's Resources for Navigating Change Through the AI Revolution

AI has revolutionized the way we engage online in 2023. From Chat GPT and AI Art Generators to healthcare, finance, and business, you can hardly read the news without reading the latest proclamation of how AI is poised to change every aspect of our lives. AI has brought fundamental changes to how we live and work, and we’re still scrambling to understand the impacts of these changes. Especially where their work is concerned, change can be difficult for people to embrace.

Getting Started with PagerDuty

In this video you will achieve a baseline understanding of what PagerDuty does and how to configure your PagerDuty account. To dive deeper into the PagerDuty platform, select relevant topics in our complimentary on-demand e-learning center at university.pagerduty.com. The PagerDuty Operations Cloud is essential infrastructure that detects and diagnoses disruptive events, mobilizes the right team members to respond, and automates workflows across your digital operations - so that your business moves forward, faster. Get started now!

What's missing from your incident management workflow

The first fifteen minutes of an incident set the tone for the rest of the resolution process. But what makes the difference between a rapid response and a stressful scramble—clear ownership—hasn't always been easy to ascertain. In this article, we’ll cover how Cortex, an internal developer portal, can be your team’s source of truth to accelerate the incident management process, and reduce MTTR.

Exploring distributed vs centralized incident command models

Recently in our Better Incidents Slack channel, there’s been some chatter around how people structure dedicated incident commanders at their company: distributed or centralized. The way I see it, there are two types of commanders: the temporary, distributed role — a hat that an on-call engineer or an engineering manager puts on during an incident. Then there’s the centralized, full-time role, where someone is the designated incident commander (or one of a few) for all incidents.

Synced for Success: OnPage & Slack for Incident Response

As the post-pandemic world finds its footing again, a resilient spirit drives the revival, propelling businesses to embrace a new era of technological innovation. Notably, IT teams are swiftly adopting the digital transformation of their processes, particularly in incident response. From virtual collaboration tools and remote IT support to automated incident management, teams have found innovative ways to ensure seamless business continuity while delivering IT services with minimum downtimes.