Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Reimagining Retail Incident Response for the Holidays

The holiday season is here, and global retailers are prepared for the biggest retail event of the year. The decrease in new COVID-19 cases, coupled with a rise in vaccination rates, provides a glimmer of hope for shoppers looking to spend for friends and family. Holiday spending is expected to break previous records this year, growing up to 10.5 percent over 2020.

Best Practices to implement in Incident Management

They are like 5 stages of an incident: 1. Assess impact 2. Inform customers (statuspage) 3. Identify the issue 4. Mitigate the issue 5. Resolve the incident Then there’s followup and further work. Also important to note that (2) should be ongoing as you progress. Updating the status page should be done within reasonable periods – e.g. every 15-20 mins unless you specify otherwise.

Introducing Adaptive Alerts: Detect application-level error trends

Adaptive Alerts is a new feature from Rollbar that adds to our reliable, informative and actionable alerts about unexpected issues in monitored applications and services. Adaptive Alerts uses anomaly detection to learn the standard behavior of enterprise applications, and alerts developers about atypical exception rates, reducing unwanted noise.

TL;DR InfluxDB Tech Tips - Visualizing Uptime with Flux deadman() Function in InfluxDB Dashboards

A common DevOps use case involves alerting when hosts stop reporting metrics, aka a deadman alert. This can be done using the monitor.deadman() Flux function. One can easily create a deadman (or threshold) check in the InfluxDB UI Alerts section or craft a custom task to alert as well. Check out InfluxDB’s Checks and Notifications system post for more details. It’s also possible to use the monitor.deadman() function directly in a dashboard cell.

December 2021 Update - On-duty board, Manual Signls and Azure Sentinel update

Our December update brings a ‘Who is on duty’ board displaying current team members on duty with contact information. In addition, we have simplified the manual sending of Signls and improved the integration with Azure Sentinel. As always, you can find all the details in this article.

Understand the scope of user impact with Watchdog Impact Analysis

Watchdog is Datadog’s machine learning and AI engine, which leverages algorithms like anomaly detection to automatically surface performance issues in your infrastructure and applications. Without any manual setup or configuration, Watchdog generates a feed of Alerts—on anomalies such as latency spikes, elevated error rates, and network issues in cloud providers—to help you reduce your mean time to detection.

Enterprise Alert 9.1 Update brings Microsoft Teams and SIGNL4 connectivity

As announced at the User Group Meeting 2021, we are now releasing Enterprise Alert 9.1. This version brings a set of new features extending the capabilities in some crucial areas. Here is what’s new in a nutshell: As always you will find more details, release notes and downloadable installer files in the online user group. You can also watch the session from our UGM (no cookie embedding): Watch this video on YouTube