Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Best Practices for Maximizing the Value of Situation Alarms

Today, IT operations teams have to process large volumes of events or alarms in near real-time in order to protect service levels, stay competitive, and deliver a great experience to customers. If it takes too long for teams to spot and repair issues, an organization runs the risk of significant business service downtime, SLA penalties, and brand reputation damages. As IT landscapes continue to grow in scale and complexity, guarding against these risks becomes increasingly difficult.

The Importance of Network Insights in Achieving Full End-To-End Observability

When we talk about observability, we tend to focus first and foremost on the metrics, logs, and traces that you can collect from applications – such as request rates, error rates, and request duration. Infrastructure-level metrics, like CPU and memory utilization, might factor into the discussion as well. Here’s a third category of critical observability insights that teams tend to overlook: the network.

The Importance of Observability for the SRE

The term Site Reliability Engineer (SRE) first appeared in Google in the early 2000s. In Google’s 2016 SRE Book, Benjamin Treynor Sloss wrote that, generally speaking, “an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).” This means that the SRE teams at Google decide how a system should run in production as well as how to make it run that way.

Don't Settle for Observability. Strive for Actionability

You’ve heard of observability, which has fast become one of the IT industry’s buzzwords du jour. But what about actionability, or the ability to translate observability into meaningful action? The latter term may not be a trending buzzword (not yet) – indeed, “actionability” perhaps sounds almost boring – but it’s just as essential as observability in managing complex, cloud-native environments.

Gaining Situational Awareness at Every Point of Sale with Broadcom's DX APM App Experience Analytics

The National Retail Federation forecasted historic holidays sales this 2021 season, as retailers grappled with high volumes of in-store and digital traffic, along with a need for full visibility into the user experience. They turned to monitoring their Point-of-Sale (POS) systems for key analytics that revealed unique, real-time details about what customers were experiencing.

Why "AIOps vs. Observability" Is a False Dilemma

What comes first – observability or AIOps? Can you achieve observability without AIOps? Do you need AIOps if you already have an observability solution in place? These are all questions that any team considering AIOps will want to answer in order to determine the real-world value that AIOps tools stand to offer.

What is the Purpose of Observability? In a Word, Innovation

Asking an IT engineer or SRE to define the purpose of observability is kind of like asking someone to explain the purpose of life: There are lots of different opinions out there, and no way of proving any of them right or wrong. You could argue that observability is just a buzzword that refers to what used to be called monitoring.

Anomaly Detection

IT Operations has a wide spectrum of roles and responsibilities. The positions range from level 1 (L1) operators to Site Reliability Engineers (SREs) and everything in between. L1 operators, for example, are (often) almost exclusively reactive. They feed off the constant stream of incidents reported by clients and events that are reported by monitoring and alerting systems. This is in contrast to SREs, who work at the other end of the spectrum.

What's New with DX Unified Infrastructure Management 20.4

DX Unified Infrastructure Management (DX UIM) enables comprehensive infrastructure observability. The solution delivers comprehensive coverage, modern administrative and operator consoles, zero-touch configuration, advanced alarm management, and more. This solution provides a unified, data-driven approach to infrastructure management. With the solution, your teams can proactively and efficiently manage all your digital ecosystems, including private and public clouds.

Enterprise IT Dashboards

Interpreting data and making fast decisions is critical for any leader in today's business world. But how is it done? Everyone remembers the old way of doing things where analysts would manually crunch the numbers and give a final output. This business intelligence would be presented to their boss, and decisions would be made. This batch way of running numbers and presenting them is not sustainable due to the massive amount of manual effort involved to recompile datasets and present them properly.