Latest Videos

SRE's Guide to Chaos & Observability

Jul 20, 2021 By Gremlin In Gremlin

Today’s distributed, cloud-based environments are incredibly complex. Not only does each component depend on many others, but modern systems are also highly dynamic—changing frequently as teams push new code or make updates to infrastructure. Taming this complexity to ensure reliability requires end-to-end observability to understand how components depend on each other. Additionally, proactive Chaos Engineering combined with AI-driven observability lets you uncover “unknown unknowns” that impact how your system will respond to different failure scenarios.

View Video

Gremlin

Read more about SRE's Guide to Chaos & Observability

Building Reliable Applications Webinar 6 17 21

Jul 20, 2021 By Gremlin In Gremlin

Test-driven development (TDD) is a process that ensures quality in the applications we develop while guarding against feature creep/skew. But as our applications have become increasingly complex, traditional testing methods are not enough. Traditional testing only evaluates what we know, but complex systems often fail due to unknowns—the things that are almost impossible to test because we are unaware of them. Chaos Engineering is the exception that allows us to test for what we don’t know.

View Video

Gremlin

Read more about Building Reliable Applications Webinar 6 17 21

Intro to Chaos Engineering 5 11 21

Jul 20, 2021 By Gremlin In Gremlin

View Video

Gremlin

Read more about Intro to Chaos Engineering 5 11 21

Gremlin ALFI Demo - AWS RDS Unavailable - Chaos Engineering

Jun 9, 2021 By Gremlin In Gremlin

In this demo, we'll share how you can use ALFI (Application Level Failure Injection) to make AWS RDS unavailable. This enables you to learn how your application handles different failure modes. We'll be using the ALFI Latency attack to perform this Chaos Engineering experiment.

View Video

Gremlin

Read more about Gremlin ALFI Demo - AWS RDS Unavailable - Chaos Engineering

Site Reliability Engineering for Kubernetes || Kubernetes In Production

May 7, 2021 By Gremlin In Gremlin

In this presentation, Tammy shares important failure modes to consider when responsible for the reliability of Kubernetes in your organization.

View Video

Gremlin

Read more about Site Reliability Engineering for Kubernetes || Kubernetes In Production

Fireside Chat with Jeff Smith and Matt Stratton Failover Conf 2021

Apr 29, 2021 By Gremlin In Gremlin

Matt Stratton, host of the Arrested DevOps podcast, will host Jeff Smith, Director of Production Operations at Centro and author of the book "Operations Anti-patterns, DevOps Solutions" for an engaging conversation about building reliable teams using DevOps principles.

View Video

Gremlin

Read more about Fireside Chat with Jeff Smith and Matt Stratton Failover Conf 2021

Fireside Chat with Jesse Robbins and Kolton Andrus Failover Conf 2021

Apr 29, 2021 By Gremlin In Gremlin

Long before Chaos Engineering was even a phrase, Jesse Robbins was Amazon.com's "Master of Disaster" using intentional failure to help the company become more reliable. Kolton Andrus (CEO at Gremlin), sits down with Jesse to learn more about his early work with GameDays, the evolution of reliability, and where the future of SRE lies.

View Video

Gremlin

Read more about Fireside Chat with Jesse Robbins and Kolton Andrus Failover Conf 2021

Fireside Chat with Ines Sombra and Ana Medina Failover Conf 2021

Apr 29, 2021 By Gremlin In Gremlin

Reliability is a requirement for the modern internet. Ana Medina joins Inés Sombra, Sr. Director of Engineering at Fastly, to discuss their approach to resilience, how the past year has influenced the way they work, and what practices your engineering organization can adopt to become more reliable.

View Video

Gremlin

Read more about Fireside Chat with Ines Sombra and Ana Medina Failover Conf 2021

Whats Next for DevOps by Emily Freeman Failover Conf 2021

Apr 28, 2021 By Gremlin In Gremlin

For over a decade, the DevOps movement has been using cultural change to power technological transformation and help companies deliver better products faster and more reliably. While many organizations have embraced this change and reaped the benefits, it hasn't come without challenges and many more remain. In this session, Emily Freeman (author of DevOps for Dummies) shares what's next for DevOps and how it will impact your organization.

View Video

Gremlin

Read more about Whats Next for DevOps by Emily Freeman Failover Conf 2021

The Evolution of Observability and Monitoring panel discussion Failover Conf 2021

Apr 28, 2021 By Gremlin In Gremlin

Observability and monitoring are critical to detecting and troubleshooting problems to build more reliable applications. As our systems become increasingly complex, our tools for getting this crucial visibility and the way we respond need to evolve too. We'll sit down with SRE leaders to discuss the processes they use to get the most insight into their applications, how they've increase the speed of detection and response, and what organizations need to do to stay on top of growing complexity.

View Video

Gremlin

Read more about The Evolution of Observability and Monitoring panel discussion Failover Conf 2021

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

SRE's Guide to Chaos & Observability

Building Reliable Applications Webinar 6 17 21

Intro to Chaos Engineering 5 11 21

Gremlin ALFI Demo - AWS RDS Unavailable - Chaos Engineering

Site Reliability Engineering for Kubernetes || Kubernetes In Production

Fireside Chat with Jeff Smith and Matt Stratton Failover Conf 2021

Fireside Chat with Jesse Robbins and Kolton Andrus Failover Conf 2021

Fireside Chat with Ines Sombra and Ana Medina Failover Conf 2021

Whats Next for DevOps by Emily Freeman Failover Conf 2021

The Evolution of Observability and Monitoring panel discussion Failover Conf 2021

Monthly Archive

Follow Us