The latest News and Information on IT Networks and related technologies.
The entire reason we have monitoring is to understand what users are experiencing with an application. Full stop. If the user experience is impacted, sound the alarm and get people out of bed if necessary. All the other telemetry can be used to understand the details of the impact. But lower-level data points no longer have to be the trigger point for alerts.
Last week a major internet outage took out one of Australia’s biggest telecoms. In a statement out yesterday, Optus blames the hours-long outage, which left millions of Aussies without telephone and internet, on a route leak from a sibling company. In this post, we discuss the outage and how it compares to the historic outage suffered by Canadian telecom Rogers in July 2022.
Incident response in a Network Operations Center (NOC) is cumbersome and time-consuming. There are many steps, many sources where incidents come from, and a long, long list of complexities involved. For instance, for incident response with a NOC, there’s the initial monitoring – Tier 1 functions of “eyes on glass” work of looking at alerts coming in and what they’re for, such as a security breach, performance issue, a hardware failure, among others.