Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Welcome to Netdata's community repository: Consul, Ansible, ML

On our journey to democratize monitoring, we are proud to have open source at the core of both our products and our company values. What started as a project out of frustration for lack of existing alternatives (see anger-driven development), quickly became one of the most starred open-source projects on all of GitHub.

An Introduction to our New Product: Logz.io Distributed Tracing

Yesterday we were excited to announce Logz.io Distributed Tracing, the most recent addition to our Cloud-Native Observability Platform. This is such a special launch for us because it makes Logz.io the only place where engineers can use the best open source monitoring tools for logs, metrics, and traces – known as the ‘three pillars’ to observability – together in one place.

Automating Operations via Closed-Loop Remediation

It's hard enough to run an operations center in the best of times, especially in large, complex environments supporting myriad applications. Some of the many challenges are: Now throw in the current set of challenges with personnel being remote, and the problems get compounded exponentially. The ability to "tap the shoulder" or "conference room huddle," while not always the most efficient to begin with, is no longer an option.

Why modern testing requires Chaos Engineering

Modern applications are changing, and traditional testing practices are no longer up to the task. Learn more about the changing landscape of QA and how Chaos Engineering provides the necessary framework for testing modern applications. Chaos and Reliability Engineering techniques are quickly gaining traction as essential disciplines to building reliable applications. Many organizations have embraced Chaos Engineering over the last few years.

Scaling Fleet and Kubernetes to a Million Clusters

We created the Fleet Project to provide centralized GitOps-style management of a large number of Kubernetes clusters. A key design goal of Fleet is to be able to manage 1 million geographically distributed clusters. When we architected Fleet, we wanted to use a standard Kubernetes controller architecture. This meant in order to scale, we needed to prove we could scale Kubernetes much farther than we ever had.

CloudFabrix featured in "Top 20 vendors shaping IT Performance" by Digital Enterprise Journal (DEJ)

Emerging digital IT paradigm shifts like Hybrid IT, Multi-Cloud, Microservices & Containerization, Serverless, Software Defined Datacenter etc. are creating compelling new opportunities for IT leaders. However, these same paradigm shifts have also led to a drastic increase in monitored assets, numerous operational tools, and exponential growth of operational data.

Knowing When to Say Goodbye

By design and tradition, telecoms networks are built to last. But in a world where the rate of innovation seems to be accelerating, the end result is that a lot of legacy infrastructure needs to keep pace with, and accommodate, multiple ‘next generation’ phases. How long this can be maintained before the imperative to rip and replace becomes impossible to ignore is the multi-million-dollar question.

How to Manage AWS Cost Outliers

A few years ago, we realized that spending in our AWS product test environment had jumped significantly from one month to the next. We drilled down into the issue and traced it to some RDS database instances that had been spun up to test new product features. No one realized that these expensive instances were left running after the tests were complete, and subsequently racking up charges for several months.

Observability with Context: Telemetry, Time, Tracing, and Topology

That’s the question ops personnel have been asking for decades whenever something goes wrong in the production IT environment. Everything was working before, so the reasoning goes, and now it’s not. We have an incident. And to figure out what caused the incident – and hence, to have any idea how to fix it – we must know what changed. There’s just one problem with this approach. What if everything is subject to change, all the time?

Ivanti Patch Management Technology Enhances XM Cyber's Breach and Attack Simulation (BAS) Platform

In today's press release, we announced the incorporation of Ivanti patch management technology into the XM Cyber BAS platform! XM Cyber is a multi-award-winning leader in breach and attack simulation (BAS) advanced cyber risk analytics and cloud security posture management.