Chaos Engineering

Grubhub and JPMC Shift Reliability Testing Left at Chaos Conf 2020

Nov 5, 2020 By Taylor Smith In Gremlin

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Gremlin’s Chaos Conf is always an exciting event, bringing together leaders at the forefront of Chaos Engineering practices. This year was no exception, moving beyond defining Chaos Engineering to more advanced adoption and best practices discussions.

Read Post

Gremlin

Read more about Grubhub and JPMC Shift Reliability Testing Left at Chaos Conf 2020

Kubernetes Chaos Engineering with MayaData and Kublr

Nov 5, 2020 By Kublr In Kublr

If you enjoyed this webinar then you'll enjoy our other On-Demand videos! Visit www.kublr.com to increase your Kubernetes knowledge.

View Video

Kublr

Read more about Kubernetes Chaos Engineering with MayaData and Kublr

ObservabilityCON Day 4 recap: a panel discussion on observability (and its future), the benefits of Chaos Engineering, and an observability demo showcase

Oct 30, 2020 By Joey Bartolomeo In Grafana

Over the past four days, Grafana Labs' ObservabilityCON 2020 brought together the Grafana community for talks dedicated to observability. We hope you enjoyed all of the sessions, which are available on demand now. (Link to them from the schedule on the event page). The conference wrapped up with predictions and advice from observability experts, lessons in failure, and Grafana Labs team members showcasing ways Grafana and other tools fit into an observability workflow.

Read Post

Grafana

Read more about ObservabilityCON Day 4 recap: a panel discussion on observability (and its future), the benefits of Chaos Engineering, and an observability demo showcase

Chaos Engineering: How to create an automated Chaos Gauntlet with Gremlin and Jenkins on AWS

Oct 29, 2020 By Gremlin In Gremlin

In this video, we will demonstrate how to use Gremlin and Jenkins to create an automated Chaos Gauntlet. This will be done using Jenkins Pipelines and Stages to inject a controlled amount of failure with the Gremlin API. We then add a final stage that allows you to optionally halt the attack from the pipeline, rather than having to wait for the full duration of the attack.

View Video

Gremlin

Read more about Chaos Engineering: How to create an automated Chaos Gauntlet with Gremlin and Jenkins on AWS

Breaking Serverless Things on Purpose: Chaos Engineering in Stateless Environments - Emrah Samdan

Oct 15, 2020 By Gremlin In Gremlin

Serverless enabled us to build highly distributed applications that led to more granular functions and ultimate scalability. However, it also brought the risk of failure from a single microservice to many serverless functions and resources. You might be able to predict and design for certain troublesome issues but there are many, many more that you probably will not be able to easily plan for. How do you build a resilient system under these highly distributed circumstances? The answer is Chaos Engineering: Breaking things on purpose just to experience how the whole system will react.

View Video

Gremlin

Read more about Breaking Serverless Things on Purpose: Chaos Engineering in Stateless Environments - Emrah Samdan

Chaos Engineering: The Path to Reliability - Kolton Andrus

Oct 15, 2020 By Gremlin In Gremlin

We’re all here for the same purpose: to ensure the systems we build operate reliably. This is a difficult task, one that must balance people, process and technology during difficult conditions. We operate with incomplete information, assessing risks and dealing with emerging issues. We’ve found Chaos Engineering to be a valuable tool in addressing these concerns. Learn from real world examples what works, what doesn’t, and what the future holds.

View Video

Gremlin

Read more about Chaos Engineering: The Path to Reliability - Kolton Andrus

Identifying Hidden Dependencies - Liz Fong Jones

Oct 15, 2020 By Gremlin In Gremlin

You don't need to write automation or deploy on Kubernetes to gain benefits from resilience engineering! Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose. We'll discuss the initial manual experiments we ran, the bugs in our automatic replacement tools we uncovered, and what steps we needed to progress towards continuously running the experiments. Today, no node at Honeycomb lives longer than 12 months, and we automatically recycle nodes every week.

View Video

Gremlin

Read more about Identifying Hidden Dependencies - Liz Fong Jones

Lessons from Incident Management and Postmortems at Atlassian - Jim Severino

Oct 15, 2020 By Gremlin In Gremlin

How do you run incidents and postmortems at a company with thousands of engineers spread across the globe? Jim Severino shares what worked (and didn't worked) for Atlassian.

View Video

Gremlin

Read more about Lessons from Incident Management and Postmortems at Atlassian - Jim Severino

Looking back on Chaos Conf 2020

Oct 15, 2020 By Andre Newman In Gremlin

It’s already been a week since we closed our third annual Chaos Conf! While we were forced to take the conference online, this meant that more of you could join us. Over 3,500 people signed up to help make this the world’s largest Chaos Engineering conference. That’s 5x more than 2019, and nearly 10x more than 2018! This is a testament to the growth of Chaos Engineering as a practice across many different industries and around the world.

Read Post

Gremlin

Read more about Looking back on Chaos Conf 2020

Incident Ready: How to Chaos Engineer Your Incident Response Process | FireHydrant

Oct 15, 2020 By FireHydrant In FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.

View Video