Operations | Monitoring | ITSM | DevOps | Cloud

Global Industrial Leader Coordinates Severity 1 Incidents with Clarity and Speed

“The first 15 minutes of a Sev-1 incident often determine the next 15 hours.” For a multi-billion dollar global industrial leader, managing Severity 1 incidents across a complex, distributed infrastructure is a high-stakes operation. When systems go down, the impact is felt instantly across production lines and global logistics.

How Techdome accelerates AI-led product delivery with Civo Kubernetes

Accessing cloud infrastructure shouldn’t slow down product innovation. Yet for many engineering teams building AI-driven platforms, traditional hyperscalers often introduce unnecessary complexity, high costs, and slow provisioning cycles. At Civo, we’ve seen a different approach emerge. Our cloud platform enables teams to move faster with Kubernetes, compute, and networking designed for simplicity and speed.

How Imperva Gets Traffic Answers in Seconds with Kentik

Imperva Network Architect, Wallace Lee, shares how Kentik helps teams drill deeper than traditional reporting tools to improve network and customer experience. Wallace shares how, during a live architecture review, Imperva’s Kentik power users answered a critical “are we safe?” traffic question in seconds. Kentik enables engineers to instantly understand prefix-level bandwidth and shows exactly which ASN and ISP traffic came from. Wallace also highlights how Kentik makes Anycast traffic visibility an “easy win,” helping teams move from questions to confident decision-making fast.

Colsubsidio transforms business process monitoring with Elastic Observability

Colsubsidio is one of the largest and most representative family compensation funds in Colombia. The organization manages and delivers essential social services to millions of users through a broad network spanning health, education, subsidies, recreation, tourism, credit, housing, pharmacies, retail supply, culture, and labor welfare.

Case Study - Troubleshooting Storage Failures in a VMware ESXi Infrastructure

IT problems happen even in the best architected infrastructure due to configuration changes, failures, upgrades and such. How quickly and effectively you can detect and resolve such problems dictates how efficient your IT operation is. Today, I’ll cover how eG Enterprise helped us troubleshoot a hardware failure (a storage battery failure) that that caused a cascade of failures in a VMware ESXi infrastructure.

Heartbeat behind the metrics | Hemachand on what visibility really means

What happens when observability grows faster than infrastructure? In this episode of Heartbeat Behind the Metrics, Hemachand Munagapati, Product Manager at Site24x7, reflects on over 15 years with the product and how the idea of a single pane of monitoring has shaped everything that followed.

How Dartmouth avoided vendor lock-in and implemented LBaaS with HAProxy One

History is everywhere at Dartmouth College, and while the campus is steeped in tradition, its IT infrastructure can’t afford to get stuck in the past. In an institution where world-class research and undergraduate studies intersect, technology must be fast, invisible, and – above all – reliable. That reliability was put to the test when Dartmouth’s load balancing vendor was acquired twice in five years, as Avi Networks moved to VMware and VMware moved to Broadcom.

How Okta keeps 99.99 percent uptime with #datadog

How do you maintain 99.99 percent uptime across thousands of Kubernetes hosts and multiple cloud providers? Okta engineers explain why observability is critical to keeping authentication and authorization services running at scale. Watch how Okta uses Datadog to bring metrics, logs, and traces into a single view, speed up root cause analysis, and reduce time to mitigation while controlling costs.