%term

Systems need to be failure proof

Nov 4, 2025 By Gremlin In Gremlin

The best advice from Anish Behanan at @Capgemini about reliability? Every system needs to be failure proof.

View Video

Gremlin

Read more about Systems need to be failure proof

If you don't test in production, you're missing risks

Oct 31, 2025 By Gremlin In Gremlin

Testing in production can be scary, but it’s necessary to improve reliability. Check out this clip from when Gremlin Co-founder and CEO Kolton Andrus sat down with Stephen Townshend on the Slight Reliability podcast!

View Video

Gremlin

Read more about If you don't test in production, you're missing risks

Validating chaos experiments with GCP Cloud Monitoring probes

Oct 31, 2025 By Ashutosh Bhadauriya In Harness

GCP Cloud Monitoring probe let you transform your existing GCP metrics into automated pass/fail validation for chaos experiments, eliminating subjective observation in favor of objective measurement. With flexible authentication options (workload identity or service account keys) and PromQL query support, you can validate infrastructure performance against defined thresholds during controlled failure scenarios.

Read Post

Harness

Read more about Validating chaos experiments with GCP Cloud Monitoring probes

Field of Dreams DevOps doesn't scale

Oct 29, 2025 By Gremlin In Gremlin

Having trouble scaling reliability? Gremlin CEO Kolton Andrus talks about just building a great tool isn’t enough.

View Video

Gremlin

Read more about Field of Dreams DevOps doesn't scale

Monitoring Chaos Experiments with New Relic Probe in Harness

Oct 28, 2025 By Ashutosh Bhadauriya In Harness

New Relic probes in Harness Chaos Engineering let you automatically validate system performance against defined SLOs during chaos experiments, transforming subjective testing into objective, metrics-driven resilience validation. By querying New Relic metrics in real-time and comparing results against your success criteria, you can programmatically verify that your systems maintain acceptable performance levels even under failure conditions.

Read Post

Harness

Read more about Monitoring Chaos Experiments with New Relic Probe in Harness

Change engineering culture with Chaos Engineering

Oct 23, 2025 By Gremlin In Gremlin

How do you spur an engineering cultural shift with Chaos Engineering? Gremlin founder and CEO Kolton Andrus explains how—and how it changed the Gremlin platform.

View Video

Gremlin

Read more about Change engineering culture with Chaos Engineering

Scale Chaos Engineering with Automation and AI

Oct 23, 2025 By Gremlin In Gremlin

Chaos Engineering and Fault Injection testing have been proven to prevent outages, increase availability, and help companies avoid costly downtime. But without the right processes or tools, they require specialized knowledge, a deep understanding of systems, and manual effort for every test. To fully realize the benefits of Chaos Engineering, testing needs to be adopted across all engineering teams without causing a lift or investment that takes away from roadmap progress.

View Video

Gremlin

Read more about Scale Chaos Engineering with Automation and AI

How to test the reliability of a Point of Sale (POS) system

Oct 20, 2025 By Gavin Cahill In Gremlin

Point of Sale (POS) systems are the backbone of any retail store. A single outage can cost retail companies thousands of dollars each minute in lost sales, and even more if the outage happens during peak hours. If the outage goes on too long, it can cause even more costly damage as customers abandon carts and turn to competitors. In an industry where customer loyalty is worth its weight in gold, that brand damage can end up even more costly than the initial lost sales.

Read Post

Gremlin

Read more about How to test the reliability of a Point of Sale (POS) system

Digital reliability has real-world impacts

Oct 16, 2025 By Gremlin In Gremlin

Gremlin co-founder and CEO Kolton Andrus reminds us that our digital infrastructure has real-world consequences when it fails.

View Video

Gremlin

Read more about Digital reliability has real-world impacts

3 key factors of reliability

Oct 15, 2025 By Gremlin In Gremlin

Amin Momin of @CapgeminiGlobal shares the three key factors every company needs to integrate into their reliability efforts. Companies that focus on these key factors save millions by decreasing their downtime and lessening its costly impact.

View Video