Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Mastering Digital Operations Across the Enterprise

I’m excited to announce that today, PagerDuty is taking our automation capabilities to new scale and scope as we enter into a definitive agreement to acquire Catalytic. With their technology and talented team we accelerate the delivery of enterprise-wide process automation that manages no-code workflows across the business, broadly applicable to any workflow, for any employee.

Postmortems Now Called Retrospectives in Blameless

Something big happened at Blameless this month — our “Postmortem” feature was updated to its new name, “Retrospective”. To the naysayer, I suppose you’re thinking, This seems trivial. Different teams call it different names anyway, so why bother making the change? First let me say, thank you for reading our blog and I hope you finish this one through to the end. Now, allow me to explain our reasoning and why we’re excited about this update.

Customizing Error Pages (Nginx Ingress Controller)

The most common way to do it, which is part of the offical solution is to create a Docker image server capable of responding to any request with 404 content, except /healthz and /metrics. This could be an Nginx instance. /healthz should return 200 /metrics is optional, but it should return data that is readable by Prometheus in case you are using it for k8s metrics. Note: Nginx can provide some basic data that Prometheus can read. /returns a 404 with your custom HTML content.

Workflow Form Layout - xMatters Support

In xMatters, the form layout is where you customize the content and options that are available to the message sender. You can use the form layout to do things like predefine recipients for your messages, add a conference bridge, attach documents, specify a customized sender display name, or add a map that the sender can use to target users at specific sites.

Alert Fatigue in SRE: What It Is & How To Avoid It

Wondering about alert fatigue? We describe what it is, how it affects software development teams, and how to avoid it. What is alert fatigue? Alert fatigue is the phenomenon of employees becoming desensitized to alert messages because of the overwhelming volume they receive, and the number of false positives they receive. The risk with alert fatigue is that important information will be overlooked or ignored.

The BigPanda ScaleUp Journey: Human/AI Collaboration, Predictive Accuracy, and Scale Power in AIOps

At the beginning of the COVID-19 pandemic, we anticipated a slow-down in IT-related spending. In reality, the opposite occurred. Companies massively expanded their digital offerings using the same IT staff they’d had pre-pandemic, even as the teams lost access to many of their existing tools while working from home. This acceleration put immense pressure on IT teams everywhere, resulting in messy incident management, outages, and a huge shortage of talent.

xMatters Out Run Release Recap: Service-centric Automations, Callable Flows, and More!

What’s one of the fundamental principles of DevOps? Automation. There are many ways to leverage automation to facilitate DevOps practices for enabling consistency, reliability, and efficiency within the organization. That’s why we’re taking serious strides to ensure that xMatters can allow full automation and coordination of the many tools we use to make incident management easier and more efficient for front-line responders.

Creating Subscription Forms - xMatters Support

In xMatters, you can use subscriptions to ensure that you are always informed about certain events. These subscriptions will send you notifications whenever an event occurs that matches your pre-determined criteria, even if you are not directly targeted to receive a notification for that event. Follow us on social!