Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Datadog and BigPanda: Observability and AIOps made better together

Datadog’s modern observability empowers development engineers with full-stack visibility, comprehensive instrumentation generation, and proactive alerts to accelerate software development releases and address potential incidents. While Datadog gives teams end-to-end visibility, it works even better together with AIOps from BigPanda – development teams gain insights into outside application dependencies and reliance on other systems.

10 Years of Failure Friday at PagerDuty: Fostering Resilience, Learning and Reliability

In today’s fast-paced and ever-evolving world of technology, failure is inevitable. Organizations should embrace failure as a learning opportunity for how to build and deliver more resilient services. At PagerDuty, we’ve practiced Failure Friday for 10 years now. Failure Friday–a practice inspired by the chaos engineering space–involves intentionally injecting failures into our systems to improve reliability and foster a proactive engineering culture.

The Unplanned Show, Episode 6: Defining AIOps with Heather Newburn

“AIOps” is a term some love to hate, but what makes it useful? In this episode, Heath Newburn breaks down the three things to look for in an AIOps solution: reduce noise, create context, and reduce toil. He also explains the challenges with domain-specific approaches, versus domain-agnostic approaches to AIOps. But even within that approach, Heath warns of “gotchas” in rules “tech debt”, data formats, and overall long implementation times.

We used GPT-4 during a hackathon-here's what we learned

We recently ran our first hackathon in quite some time. Over two days, our team collaborated in groups on various topics. By the end of it, we had 12 demos to share with the rest of the team. These ranged from improvements in debugging HTTP request responses to the delightful “automatic swag sharer.” Within our groups, a number of us tried integrating with OpenAI’s GPT to see what smarts we could bring to our product.

How summertime turns up the heat on cyber readiness (and what to do about it)

“Malicious cyber actors aren’t making the same holiday plans as you.” (CISA & FBI) Summertime is prime time for cyberattacks. According to one survey, 58% of security professionals believe that there is seasonality in the attacks that their company experiences every year, with the majority citing summer as high season for breaches.

In review: Gartner Hype Cycle for ITSM

The OnPage team is pleased to inform that we’ve been included in Gartner’s ® latest Hype Cycle for ITSM, 2023 report, listing OnPage as a sample vendor in the Automated Incident Response category. For those unfamiliar with it, Gartner’s Hype Cycle for IT Service Management (ITSM) highlights tools and technologies that shape the ITSM ecosystem.

What's New in PagerDuty iOS and Android Mobile Applications

The PagerDuty Operations Cloud is your platform for action in critical moments. By harnessing the capabilities of AI and automation, it has the ability to detect and diagnose disruptive incidents, assemble the appropriate team members for prompt response, and optimize your digital operations by streamlining infrastructure and workflows.

Limitless Status Page Customization - Unlocked

Maintaining a comprehensive and engaging status page is the cornerstone of an effective incident communication strategy, yet too many companies limit themselves in this respect. Some rely on an assortment of disjointed application monitoring and manual incident notifications, while others look to the cheapest status page they can find.

Enhanced Incident Response: Maximizing Microsoft Teams with Squadcast

Off late more and more businesses are relying on ChatOps tools like Microsoft Teams for a range of functions beyond simple communication. Incident management is no exception to this growing trend. However, Microsoft Teams alone may not possess all the necessary capabilities to efficiently perform these functions. To bridge this gap, integration with core applications becomes necessary.

Gartner Market Guide: Embedding Automation Into the Enterprise

“Existing workload automation strategies are unable to cope with the expansion in complexity of workload types, volumes and locations driven by evolving business demand, as per Gartner. Digital business is slowed without collaboration and automation inside and outside of IT, leading to siloes of capabilities across business and IT teams.Cost optimization is an evolving challenge, driven by technical debt and requirements to demonstrate business value of investments.”