Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Release v2.1: Performance & Scalability Improvements, Configurable Alert Repeat Notifications & more

The Netdata Team is very excited to introduce you to Netdata v2.1 and to all the new features and improvements in the new version. RELEASE HIGHLIGHTS: Major Performance and Scalability Improvements This release significantly enhances Netdata's performance and streaming capabilities, with particular focus on multi-parent infrastructures: Cloud: Automated Room Assignment with Label-Based Rules Netdata Cloud Dashboard introduces node membership rules—a powerful new feature that transforms how you organize your infrastructure monitoring.
Sponsored Post

Testing Kubernetes Ingress with Production Traffic

Kubernetes is an incredibly powerful solution, but testing the Kubernetes Ingress resources themselves can prove to be quite tricky. This can lead to significant frustration for developers - bugs can pop up in production that weren't caught during testing, workflows that make sense on paper might fail in practice, and so forth.

Incident Management Beyond Alerting: Utilizing Data & Automation for Continuous Improvement

Managing incidents effectively is not just about responding to alerts; it’s about building a resilient system that thrives on continuous improvement. Modern organizations operate in complex environments where even minor disruptions can escalate into major issues. This calls for a proactive approach that leverages data and automation to optimize the entire incident response lifecycle.

Decoding devices with DHCP fingerprinting for smart IP address assignment

In today’s dynamic network environments, where countless devices—ranging from laptops and smartphones to IoT sensors and smart appliances—connect and communicate, efficient IP address management is critical. Ensuring each device receives the right configuration not only optimizes network performance but also improves visibility and control. However, identifying these devices accurately can be challenging, given the diversity of operating systems, hardware, and vendors.

A Guide to Optimizing Kubernetes Clusters with Karpenter

With the promise of auto-provisioning and self-healing, Kubernetes environments can be an attractive option for hosting your application platform. However, with increasing budget restrictions, the competitive cloud providers and offerings, and the need to do more with less, engineers are looking to get a handle on their resource utilization.

Your Guide To Datadog Cost Optimization: 7 Tips For Reducing Spend

As cloud systems become increasingly sophisticated, you want a cloud monitoring platform that helps you identify, isolate, and fix root-cause issues. Meanwhile, engineering leaders are under increasing pressure to reduce technology costs as the global economic outlook remains uncertain. With Datadog, you can observe, monitor, analyze, and report on the health of your infrastructure, applications, and services in any cloud and at scale.

How to support a growing Kubernetes cluster with a small etcd

Etcd plays a critical role in your Kubernetes setup: it stores the ever-changing state of your cluster and its objects, and the API server uses this data to manage cluster resources. As your applications thrive and your Kubernetes clusters see more traffic, etcd handles an increasing amount of data. But etcd’s storage space is limited: the recommended maximum is 8 GiB, and a large and dynamic cluster can easily generate enough data to reach that limit.

Building RAG with enterprise open source AI infrastructure

One of the most critical gaps in traditional Large Language Models (LLMs) is that they rely on static knowledge already contained within them. Basically, they might be very good at understanding and responding to prompts, but they often fall short in providing current or highly specific information.