Operations | Monitoring | ITSM | DevOps | Cloud

Microsoft's 3 major incidents in 10 days, where did they go wrong?

Just in case you haven’t heard, last week Microsoft experienced a huge outage that prevented users from accessing its Office 365 cloud-based subscription service which serves 200 million active monthly users. This latest outage was the third in ten days, causing the company to receive a deluge of customer complaints about a 'something went wrong' message that popped up when they tried to access their accounts.

October 2020 Update: Mute overwrite for iPhone (Critical Alerts), undo and more

Our October update brings the long-awaited mute-overwrite on iPhone (‘critical alerts’). We also introduce an undo action for Signl acknowledgements or closures. And in the web app you can now batch-ack and close to multiple Signls at once. All new features are introduced below – enjoy.

Anomaly detection 101

What is anomaly detection? Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior. Machine learning is progressively being used to automate anomaly detection.

How SIGNL4 provides for a digital handover procedure

Handover procedures in operations and maintenance are a key element of business continuity. As work in this field is usually organized in shifts, it is essential to keep track of any critical incidents, machine breakdowns, job ownership, completion, issues that are still open or unresolved and other related items. Such knowledge has a significant impact on a timely or even proactive response, for instance if issues re-surface.

Streamline communication workflows with the Datadog Slack App

Sharing information about the health and performance of an application is a critical part of any team’s daily workflow. That’s why we’re excited to announce the Datadog Slack App, which simplifies crucial communication tasks by deepening the integration between Datadog and Slack.

Top 6 Functional AIOps Requirements to Evaluate in Your RFP

AIOps adoption is on the rise. According to Gartner, by 2023 40 percent of DevOps teams will augment application and infrastructure monitoring tools with AIOps platform capabilities. Use cases are also expanding beyond IT to include IT Service Management (ITSM), digital experience monitoring (DEM), DevOps, Application Performance Monitoring (APM) and third party services.

How to: Automatically Archive Incident Slack Channels using conditions in FireHydrant Runbooks

FireHydrant’s Slack integration is a great way to speed up your incident response, especially if FireHydrant Runbooks is automatically creating channels in your Slack workspace for each incident. “But what happens after the incident?” First of all, you shouldn’t have to manually archive those Slack channels; especially when you don’t want them clogging up the Slack navigation bar.

Detecting Security Vulnerabilities with Alerts

Every day we discover new vulnerabilities in our systems, cracks in the fence the adversaries take advantage of to get into your organization and wreak havoc. Understanding what you have in your environment (e.g., types of devices, systems equipment, etc.) is very important in order to make sure the controls in place are working and more importantly, keeping up with the threat landscape.