Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Achieve Better Accountability With Full-Service Ownership

Software teams seeking to provide better products and services must focus on faster release cycles. But running reliable systems at ever-increasing speeds presents a big challenge. Software teams can have both quality and speed by adjusting the policies around ongoing service ownership. While on-call plays a large part in this model, advancement in knowledge, more resilient code, increased collaboration, and practice also mean engineers don’t have to wake up to a nightmare.

What is a post mortem incident? How can we monitor this?

In particular, I liked very much the article that our colleague Sara Martin wrote in Pandora FMS blog about crisis management in information technology, these are the steps: Legend: “Jack’s Lantern (https://commons.wikimedia.org/wiki/File:Jack-o-lantern.svg) This article starts from point number five: when after a certain time of recovery the crisis has been solved and becomes a post mortem incident. This word comes from the Latin language and it means “after death”.

How to Manage a Critical Incident

Critical incidents don’t come with a predetermined schedule or warning. So, it’s up to your organization to have an incident response procedure in place to combat these crises. Don’t have one? Read below to perfect your incident response operations and adopt the right tools and procedures to fight against any critical event.

October 2019 Update: Improved usability and new emergency alarm triggering in the app

The October update of the mobile app includes the improvements described below and the user experience includes a new feature. The user interface of the app has been improved so that you can now decide for yourself if you want to see all Signls on the dashboard differentiated by categories. Using the “More” button on the dashboard’s Signl widget, you can now open a context menu and show or hide the display of the categories.

Blameless Culture Key to Addressing Outage Outrage in Australia

After the unfortunate Commonwealth Bank of Australia outage last week, the powerful Payment Systems Board—whose members include the chairs of the RBA and APRA – announced it would make all outage data public to prevent banks, payment schemes, and telecommunications carriers from “hiding behind” the performance statistics shared by each institution.

AIOps in the Spotlight: What It Offers & Why It Matters

Today, most IT teams find themselves facing a number of challenges presented by the new and increasingly complex infrastructure that accompanies digitization, including an exponential increase in data volumes and types. In fact, Gartner estimates that the data volumes generated by IT infrastructure and applications are increasing two- to three-fold every year (and that’s compounding growth). There’s clearly too much data for the humans on the IT team to sort through on their own.