Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Automating Operations via Closed-Loop Remediation

It's hard enough to run an operations center in the best of times, especially in large, complex environments supporting myriad applications. Some of the many challenges are: Now throw in the current set of challenges with personnel being remote, and the problems get compounded exponentially. The ability to "tap the shoulder" or "conference room huddle," while not always the most efficient to begin with, is no longer an option.

Why modern testing requires Chaos Engineering

Modern applications are changing, and traditional testing practices are no longer up to the task. Learn more about the changing landscape of QA and how Chaos Engineering provides the necessary framework for testing modern applications. Chaos and Reliability Engineering techniques are quickly gaining traction as essential disciplines to building reliable applications. Many organizations have embraced Chaos Engineering over the last few years.

Scaling Fleet and Kubernetes to a Million Clusters

We created the Fleet Project to provide centralized GitOps-style management of a large number of Kubernetes clusters. A key design goal of Fleet is to be able to manage 1 million geographically distributed clusters. When we architected Fleet, we wanted to use a standard Kubernetes controller architecture. This meant in order to scale, we needed to prove we could scale Kubernetes much farther than we ever had.

CloudFabrix featured in "Top 20 vendors shaping IT Performance" by Digital Enterprise Journal (DEJ)

Emerging digital IT paradigm shifts like Hybrid IT, Multi-Cloud, Microservices & Containerization, Serverless, Software Defined Datacenter etc. are creating compelling new opportunities for IT leaders. However, these same paradigm shifts have also led to a drastic increase in monitored assets, numerous operational tools, and exponential growth of operational data.

Knowing When to Say Goodbye

By design and tradition, telecoms networks are built to last. But in a world where the rate of innovation seems to be accelerating, the end result is that a lot of legacy infrastructure needs to keep pace with, and accommodate, multiple ‘next generation’ phases. How long this can be maintained before the imperative to rip and replace becomes impossible to ignore is the multi-million-dollar question.

How to Manage AWS Cost Outliers

A few years ago, we realized that spending in our AWS product test environment had jumped significantly from one month to the next. We drilled down into the issue and traced it to some RDS database instances that had been spun up to test new product features. No one realized that these expensive instances were left running after the tests were complete, and subsequently racking up charges for several months.

Observability with Context: Telemetry, Time, Tracing, and Topology

That’s the question ops personnel have been asking for decades whenever something goes wrong in the production IT environment. Everything was working before, so the reasoning goes, and now it’s not. We have an incident. And to figure out what caused the incident – and hence, to have any idea how to fix it – we must know what changed. There’s just one problem with this approach. What if everything is subject to change, all the time?

Ivanti Patch Management Technology Enhances XM Cyber's Breach and Attack Simulation (BAS) Platform

In today's press release, we announced the incorporation of Ivanti patch management technology into the XM Cyber BAS platform! XM Cyber is a multi-award-winning leader in breach and attack simulation (BAS) advanced cyber risk analytics and cloud security posture management.

PuTTY from a monitoring perspective

PuTTY is a free program (MIT license) for x86 and AMD 64 architectures (now in experimental stages for ARM). It was developed in 1997!, by Simon Tatham, a British programmer. In this blog, we have been reviewing this useful program for several years, and even the great Pandora FMS team has confirmed it just now in 2020, in the list of network commands for Microsoft Windows® and GNU/Linux®. What if it deserves its own article? Read and judge for yourselves.

Managing IT at Scale: Distributed Monitoring for Large IT Environments

Growth for an enterprise is an exciting thing, but it often presents a unique challenge for IT professionals. There are common roadblocks that are encountered when trying to upscale an IT management environment. In this first blog of our Managing IT Infrastructure at Scale series, we discuss the benefits of distributed monitoring data for large IT environments.