Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Improved Pagerduty Integration with Detailed Alerts

AppSignal now supports the next API version of PagerDuty. 🎉 One of our devs was on support rotation the other day, and a customer asked whether we could add support for the next API version of PagerDuty. We won’t tell you who it was, but this developer typically answers questions by solving things as quickly as he can. So, two days later, boom! The improved integration for Pagerduty went live.

More Chatbots - Slack, Mattermost, Microsoft Teams, and Google Chat

Today, we are excited to announce PagerTree has added 3 new chatbot services including Mattermost, Microsoft Teams and Google Hangouts Chat (this is in addition to our core Slack notification channel). Chatbots are available on all pricing tiers free of charge! :) If you don’t already have an account, sign up for a free-trial now. Our chatbots are will post alert details to a “channel” of your choice.

Better Than 'Business As Usual': Rethinking How PagerDuty Works in a Post-COVID-19 World

Earlier this year, as COVID-19 appeared, our global community of almost 800 employees became a fully remote workforce—effectively overnight. Now, all of us have had a taste of what it’s like to work from home all the time, from embracing the benefits of less time commuting and more time with our families, to the downsides of feeling isolated and missing seeing our colleagues in “real life.”

Using Observability to Inspect and Adapt CI/CD Pipelines

In this blog post series, I’ve explored the relationship between observability and a set of software delivery lifecycle practices that help organizations adopt DevOps practices and change their ways of working from being project centric to product-centric. I started with Site Reliability Engineering, then considered Value Stream Management (VSM) and finish with this post on Continuous Integration and Delivery (CI/CD). Defining Continuous Integration

Thales accelerates incident resolution & decreases downtime with Exigence

Thales Cloud Protection & Licensing, part of the Thales Group, was looking to improve how it handles critical incidents. Whenever an incident hit just gathering up the incident team would be a cumbersome and time-consuming task that involved a lot of manual work . Multiple calendar invites would be sent to different people in and outside of the organization, multiple times, urging them to join calls and meetings.

Let's Talk AIOps: Part 1: What IS AIOps, Exactly?

This is the first in a two-part blog series deconstructing AIOps for ITOps leaders. If you gave me a dollar for every company that claims that they use “A.I.,” I’d be doing pretty well. But as a marketer, I can’t help but be a little skeptical about those claims. Let me explain.

How to Improve the Reliability of a System

Site reliability engineering is a multifaceted movement that combines many practices, mentalities, and cultural values. It looks holistically at how an organization can become more resilient, operating on every level from server hardware to team morale. At each level, SRE is applied to improve the reliability of relevant systems. With such wide-reaching impact, it can be helpful to take time to reevaluate how to improve the reliability of a system.

Working with multiple on-call teams using Zabbix and iLert

This post outlines how to use Zabbix and iLert with multiple on-call teams, where each team is responsible for a set of host groups in Zabbix, and therefore, will only receive alerts for the services it is responsible for. But first, let’s start with the basic needs when being on-call.

Industry Experts Explain how to Thrive in a Post-COVID World

With complex architectures, gaining visibility into systems is becoming more difficult. Additionally, with the move to remote work, it’s more important than ever before to adapt to new modes of work such as asynchronous collaboration. So how do we adjust to these changing times? In a CIO panel hosted by Lightspeed Venture Partners, industry experts came together to discuss these questions. Below are key insights from their conversation.

Retail Industry Trends 2020: All-In on Digital Since COVID-19

This is the first in a series of posts we’ll be publishing on trends we’re seeing in the retail industry and how IT organizations tasked with deploying and maintaining flawless digital customer experiences can take advantage of PagerDuty to ensure always-on reliability. It’s been a tough year for retail.