Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

The Ultimate, Free Incident Retrospective Template

Incident retrospectives (or postmortems, post-incident reports, RCAs, etc.) are the most important part of an incident. This is where you take the gift of that experience and turn it into knowledge. This knowledge then feeds back into the product, improving reliability and ensuring that no incident is a wasted learning opportunity. Every incident is an unplanned investment and teams should strive to make the most of it.

5 Tips For Better On-Call Support (in 2020)

Your enterprise needs on-call support, but it often struggles to achieve its desired results. Yet, the longer your enterprise waits to improve its on-call support processes and procedures, the greater the risk becomes that a minor outage could cause substantial downtime. Bonus Material: Advanced Escalation Example PDF Ultimately, your enterprise needs seamless on-call support processes and procedures.

A Closer Look at PagerDuty's New AIOps Capabilities

Another PagerDuty Summit is in the books, and we’re still coming down from the excitement and energy our customers and community showed us over the past week. We made several big announcements over the course of the conference, but none more significant than the AIOps advancements on our digital operations platform. We introduced a number of ways customers can apply machine learning algorithms and automation to a wide range of workflows across the platform.

Any PLC alarm on your mobile device

Maintenance of machines is an incredibly important task. And it is important to fix a machine before it completely fails. In reactive maintenance scenarios, speed of response is key. Once an issue is detected is important to communicate as reliably and quickly as possible to the right engineer. Ideally, the machine is connected directly to team of mobile engineers in charge and can let them know what exactly happened and what needs to be fixed.

The incident resolution mandate of telehealth and telepharmacy providers in the age of Covid-19

The incident management challenges of a pandemic-driven world & how to overcome them “While the safety and well-being of workers affected by COVID-19 is the first priority, companies will also triage other essentials, such as incident management and stakeholder communications.” (PWC) In a pandemic-stricken world that is consuming products and services over the internet, more than ever, there is a great strain on digital and connectivity systems.

Here's your Complete Definition of Software Reliability

We live in the era of software convenience, where we take for granted that hundreds of services are always at our fingertips. These applications become part of our daily routines because they are so reliable. However, this consistency makes reliability work invisible to the end user. It can be difficult to appreciate the effort behind maintaining a high availability service. Because of that, people may misunderstand exactly what makes a service reliable.

Best PagerDuty Alternatives of 2020: An Independent Review by StatusGator

Modern applications offer more and more features, and the infrastructure needed to run them becomes increasingly complex. The need for Application Performance Monitoring (APM) and Network Performance Monitoring (NPM) tools like PagerDuty is obvious, as the cost of downtime can be exorbitant for a business of any scope. Thus, every business needs to use Pager Duty or one of its alternatives that alerts the Ops team should anything go awry.

Transparency Under the Hood: Self-service Integration Diagnostics

As many recent studies show (like this one from Mckinsey) , self-service in B2B products is a growing trend. Today’s enterprise users expect the same seamless and simple experience they’ve learned to love as consumers. This works well for many simple tasks. But when it comes to more complex actions that require working with ‘under the hood’ technical features, things haven’t changed much since the early days of enterprise technology.

Build your API first

I have a beef with companies that don’t expose nearly everything their product can do with an API. I get anxious wondering, “why can I only do some of the things via the API? How is this sausage made?” Sure, there are plenty of examples of endpoints that shouldn’t be exposed, such as changing passwords probably should be kept private. Regardless, there are tons of examples of products that I can type in a field in the UI, but that field isn’t available in the API.

Escalate Critical Issues with PagerDuty and Sentry

Connecting Sentry and PagerDuty is a great way to make sure important issues don’t get stuck in backlog purgatory. But sometimes there’s a drop-everything critical issue that can’t wait for a sprint planning meeting. That’s why we’re extending our PagerDuty integration to support Metric Alerts.