Latest Videos

More Resilience, Less Overhead: How to Modernize Disaster Recovery Testing

Jul 1, 2026 By Gremlin In Gremlin

• Disaster recovery planning is essential for ensuring digital services remain online in the face of catastrophic failures or outages. When a major digital infrastructure outage occurs, systems need to be set up to automatically respond and restore functionality as quickly as possible. But no matter how in-depth your disaster recovery plan is, it’s still only theoretical until it’s thoroughly tested under realistic failure conditions, which is why testing is often mandated by leadership and regulators.

View Video

Gremlin

Read more about More Resilience, Less Overhead: How to Modernize Disaster Recovery Testing

Test your AI model training reliability, too

Mar 13, 2026 By Gremlin In Gremlin

Training is at the heart of every LLM model, but it’s still an application running on an infrastructure, which means it can fail. Our GPU test helps you test your training GPUs so you don’t lose that valuable work. TRANSCRIPT: One of the things we built recently was the GPU Gremlin. So if you are training a bunch of models and you're doing a bunch of GPU testing. You know, we want to give you the tools to be able to go test that, to understand how training the model could fail.

View Video

Gremlin

Read more about Test your AI model training reliability, too

You need to regularly test your reliability

Feb 24, 2026 By Gremlin In Gremlin

Reliability testing isn’t a one-and-done thing. You need to test on a regular schedule to make sure your system is reliable in the face of changing systems.

View Video

Gremlin

Read more about You need to regularly test your reliability

Disaster Recovery Testing by Gremlin

Feb 3, 2026 By Gremlin In Gremlin

Do you know how your system will respond when major outages strike? Disaster Recovery Testing safely simulates real catastrophic failures across your entire system. You can centrally and easily run zone, region, and datacenter-scale reliability tests across your entire organization simultaneously for disaster recovery, business continuity, compliance verification, and more. With Disaster Recovery Testing, tests that used to take engineering-months and dozens of experts can be done safely and securely in hours by a single person.

View Video

Gremlin

Read more about Disaster Recovery Testing by Gremlin

AI has to be auditable to be reliable

Jan 28, 2026 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how companies will want to audit AI to keep it reliable.

View Video

Gremlin

Read more about AI has to be auditable to be reliable

AI reliability needs system reliability

Jan 22, 2026 By Gremlin In Gremlin

AI operates on the same systems and infrastructure as every application, which means if you want to keep it reliable, you have to keep the systems underneath it reliable. Gremlin CEO Kolton Andrus explains more in this clip from an AI reliability roundtable with @nobl9inc and @Pagerduty.

View Video

Gremlin

Read more about AI reliability needs system reliability

AI reliability requires different SLOs

Jan 16, 2026 By Gremlin In Gremlin

In this webinar clip, Alex Nauda, CTO of Nobl9, explains how keeping AI reliable means changing how you look at SLOs.

View Video

Gremlin

Read more about AI reliability requires different SLOs

We test our own critical dependencies

Jan 14, 2026 By Gremlin In Gremlin

Even if you know a dependency is critical, you still should test it. Otherwise, who knows what will happen if it goes down?

View Video

Gremlin

Read more about We test our own critical dependencies

AI reliability changes how you watch your systems

Jan 8, 2026 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how AI shifts how you watch your systems to keep them reliable.

View Video

Gremlin

Read more about AI reliability changes how you watch your systems

Chaos Engineering strengthens your team

Jan 6, 2026 By Gremlin In Gremlin

Reliability testing not only strengthens your system, it also strengthens your team.

View Video

Gremlin

Read more about Chaos Engineering strengthens your team

Operations | Monitoring | ITSM | DevOps | Cloud

More Resilience, Less Overhead: How to Modernize Disaster Recovery Testing

Test your AI model training reliability, too

You need to regularly test your reliability

Disaster Recovery Testing by Gremlin

AI has to be auditable to be reliable

AI reliability needs system reliability

AI reliability requires different SLOs

We test our own critical dependencies

AI reliability changes how you watch your systems

Chaos Engineering strengthens your team

Monthly Archive

Follow Us