Operations | Monitoring | ITSM | DevOps | Cloud

Observability: It's Every Engineer's Job, Not Just Ops' Problem

For years, organizations have used the term “observability” as an evolution of monitoring, a discipline practiced by operations teams to understand whether production software was working. I’ve been annoyed by this—not because it’s philosophically wrong, but because it diminishes the importance of observability as a generalized software engineering practice.

TCP Monitoring With AppNeta: Why Expanded Support is a Game Changer

Broadcom continues to expand the capabilities of AppNeta by Broadcom, offering ongoing enhancements in features and value. With the introduction of TCP protocol support, users can now achieve more streamlined setup processes and deeper visibility into modern network paths. These enhancements help eliminate blind spots and improve monitoring accuracy across complex network environments. Review this post to learn more about these valuable new capabilities.

Calico Open Source 3.30: Exploring the Goldmane API for custom Kubernetes Network Observability

Kubernetes is built on the foundation of APIs and abstraction, and Calico leverages its extensibility to deliver network security and observability in both its commercial and open source versions. APIs are the special sauce that help automate and operationalize your Kubernetes platforms as part of a CI/CD pipeline and other GitOps workflows. Calico OSS 3.30, introduces numerous battle-tested observability and security tools from our commercial editions. This includes the following key features.

Deadman Alerts with the Python Processing Engine

Sometimes silence isn’t golden; it’s a red flag. Whether you’re monitoring IoT sensors, system logs, or application metrics, missing data can be just as critical as abnormal data. Without visibility into these gaps, you risk overlooking potential failures, security threats, or operational inefficiencies. In time series workflows, detecting silence is often the first sign of trouble—whether it’s a network issue, device failure, sensor failure, or stalled process.

Comparing ELK, Grafana, and Prometheus for Observability

Monitoring and observability are cornerstones of modern infrastructure management. Three popular solutions that often come up in this space are the ELK Stack, Grafana, and Prometheus. This comparison breaks down the key differences, use cases, and integration capabilities to help you determine which tool or combination better suits your operational needs.

Leveraging an IDP for Navigating Staff Changes: Onboarding and Layoffs

Change is constant in engineering organizations. Whether you’re growing quickly and onboarding dozens of engineers—or navigating the difficult process of layoffs—your systems, services, and institutional knowledge don’t pause. That’s where an Internal Developer Portal (IDP) becomes indispensable.

ELK vs CloudWatch - Choosing the Right Monitoring Tool

In today’s evolving cloud-native landscape, having a reliable monitoring and observability setup is essential for maintaining application health and performance. Two widely used solutions, Amazon CloudWatch and the ELK Stack (Elasticsearch, Logstash, and Kibana) offer powerful capabilities for log management, metrics, and alerting. But each serves different needs and environments.

Opsgenie Is Sunsetting: What to Look for in an Alternative

Atlassian is retiring Opsgenie, and if you're one of the teams relying on it to manage on-call and incidents, you're facing a tough question: Do you make the forced migration to Jira Service Management or Compass, scramble for a lookalike tool — or use this moment to upgrade your entire approach to incident response? If you’re facing that decision, we get it. Changing tools midstream isn’t ideal (to say the least). But it’s also a rare opportunity to take a meaningful step forward.

The Critical Role of Observability in Healthcare IT

Healthcare organizations are increasingly leading the charge in technology adoption, rapidly deploying advanced applications and digital tools to improve patient outcomes and operational efficiency. However, this acceleration is placing unprecedented pressure on existing IT infrastructure. Teams are being asked to support next-generation workloads, such as AI-powered diagnostics and real-time data platforms, on legacy systems, often without the benefit of increased budget or headcount.