Operations | Monitoring | ITSM | DevOps | Cloud

Honeycomb + Google Gemini

Today at Google Next, Charity Majors demonstrated how to use Honeycomb to find unexpected problems in our generative AI integration. Software components that integrate with AI products like Google’s Gemini are powerful in their ability to surprise us. Nondeterministic behavior means there is no such thing as “fully tested.” Never has there been more of a need for testing in production!

How to standardize resiliency on Kubernetes

There’s more pressure than ever to deliver high-availability Kubernetes systems, but there’s a combination of organizational and technological hurdles that make this ‌easier said than done. Technologically, Kubernetes is complex and ephemeral, with deployments that span infrastructure, cluster, node, and pod layers. And like with any complex and ephemeral system, the large amount of constantly-changing parts opens the possibility for sudden, unexpected failures.

Stay up to date on the latest incidents with Bits AI

Since the release of ChatGPT, there’s been growing excitement about the potential of generative AI—a class of artificial intelligence trained on pre-existing datasets to generate text, images, videos, and other media—to transform global businesses. Last year, we released our own generative AI-powered DevOps copilot called Bits AI in private beta. Bits AI provides a conversational UI to explore observability data using natural language.

Step-by-Step Guide to Monitoring Your SNMP Devices With Telegraf

Monitoring SNMP (Simple Network Management Protocol) devices is crucial for maintaining network health and security, enabling early detection of issues and proactive troubleshooting. Continuous monitoring ensures efficient resource utilization, minimizes downtime, and enhances overall network performance. In this article, we'll detail how to use the Telegraf agent to collect SNMP (MIB) performance statistics that you can forward to a data source.

The Complete Guide to Capacity Management in Kubernetes

In the dynamic world of container orchestration, Kubernetes stands out as the undisputed champion, empowering organizations to scale and deploy applications seamlessly. Yet, as the deployment scope increases, so do the associated Kubernetes workload costs, and the need for effective resource capacity planning becomes more critical than ever. When dealing with containers and Kubernetes you can find yourself facing multiple challenges that can affect your cluster stability and your business performance.

Netdata is the only real-time monitoring solution: Justified

In the digital era, where data flows like a ceaseless river, real-time monitoring stands as a pivotal technology, allowing organizations to not only keep pace but also to deeply understand the intricate dance of their operational ecosystems. This technology is not just about keeping tabs; it’s about gaining a profound, almost intuitive sense of the micro-worlds within which systems, containers, services, and applications pulse and thrive.

Understanding Monitoring Tools

If you care about operational excellence when it comes to your IT infrastructure, the role of monitoring systems is pivotal. As we navigate through the myriad of available monitoring tools, it becomes essential to understand the distinct architectures, styles, and focal points of various monitoring solutions, as well as the time-to-value they offer.

Mastering Test Automation Strategies for Efficient Quality Assurance

In the software development landscape, mastering test automation has become crucial for ensuring efficient quality assurance (QA) processes. Test automation not only accelerates testing cycles but also enhances test coverage and accuracy, leading to higher-quality software releases. This blog explores key strategies for mastering test automation and achieving efficient QA outcomes.

Introducing Playbooks automation

We're rolling out Playbooks, our latest in fully automating the incident response process. Imagine every action you (incident responders), had to manually take are now fully automated with Playbooks. Steps like initiating a war room (video conference), logging incidents, sending out alerts, and running diagnostic scripts are now executed with precision, every single time, are all now effortlessly automated without you lifting a finger.

Why Selector's SREs Chose Selector for Kubernetes and Multi-cloud Application Observability

Selector offers comprehensive monitoring, observability, and AIOps solutions for service providers and enterprises. The process begins with collecting, aggregating, and analyzing multi-domain operational data from various sources, such as SNMP, streaming telemetry, syslogs, and Kafka. Selector then applies advanced AI/ML techniques to power features such as anomaly detection, event correlation, root cause analysis (RCA), smart alerting, and a conversational GenAI-driven chat tool, Selector Copilot.