Chaos Engineering

How Gremlin runs a GameDay

May 10, 2022 By Sydney Lesser In Gremlin

You might be familiar with GameDays at this point. From watching our Introduction to GameDay webinar, viewing our Demo video, and reading our tutorial, you’ve probably learned that GameDays were created with the goal of increasing reliability by purposely creating major failures on a regular basis. Better yet, perhaps your own team has run a GameDay and learned something new about their services’ behavior during failure scenarios.

Read Post

Gremlin

Read more about How Gremlin runs a GameDay

Introduction to GameDay webinar

May 10, 2022 By Gremlin In Gremlin

Learn all about Gremlin's GameDay feature in this webinar presented by Sydney Lesser and Andre Newman. GameDays are organized team events to proactively improve reliability using Chaos Engineering principles. Gremlin makes it easier than ever to prepare, execute, and learn from them. Increase your system’s reliability with safe, secure, and simple GameDays.

View Video

Gremlin

Read more about Introduction to GameDay webinar

How to run a GameDay using Gremlin

May 10, 2022 By Gremlin In Gremlin

Learn how to run a GameDay in Gremlin. This video walks you through creating a GameDay, adding and running Scenarios, recording your observations, and linking to Jira in the Gremlin web app.

View Video

Gremlin

Read more about How to run a GameDay using Gremlin

Site Reliability Chats (May 4, 2022)

May 4, 2022 By Gremlin In Gremlin

View Video

Gremlin

Read more about Site Reliability Chats (May 4, 2022)

Podcast: Break Things on Purpose | Natalie Conklin: Learning to Embrace Change

May 3, 2022 By Julie Gunderson In Gremlin

Natalie Conklin, tamer of chaos and Head of Engineering here at Gremlin, joins us to talk about embracing change, working alongside each other, and building more reliable systems. Natalie has a talk coming up at DevOpsDays Boise which she has titled “Embracing Change Fearlessly.” Her talk is oriented around enabling teams to take calculated risks and having the guts to take those risks. Natalie spent time working in India, which helped solidify her “fearlessly” philosophy.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Natalie Conklin: Learning to Embrace Change

Site Reliability Chats (April 27, 2022)

Apr 27, 2022 By Gremlin In Gremlin

View Video

Gremlin

Read more about Site Reliability Chats (April 27, 2022)

Site Reliability Chats (Apr 20, 2022)

Apr 20, 2022 By Gremlin In Gremlin

In this episode Julie and Jason share updates on the Atlassian outage, a new incident at Cerner, and problems at the IRS. They also cover post-incident investigations from Cloudflare and Datadog.

View Video

Gremlin

Read more about Site Reliability Chats (Apr 20, 2022)

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Apr 19, 2022 By Jason Yee In Gremlin

For this episode we’re continuing to “Build Things on Purpose” with JJ Tang, co-founder of Rootly, who joins us to talk about incident response, the tool he’s built, and his many lessons learned from incidents. Rootly is aiming to automate some of the more tedious work around incidents, and keeping that consistency. JJ chats about why he and his co-founder built Rootly, and the problems they’re trying to fix and eliminate when it comes to reliability.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Chaos Engineering & Autonomous Optimization combined to maximize resilience to failure

Apr 14, 2022 By Kyle McMeekin In Gremlin

Today’s enterprises are struggling to cope with the complexities of their environments, technologies, and applications. On top of these challenges, they face faster release rates, and the need to always deliver the highest level of performance and availability to end-users, at the lowest possible cost.

Read Post

Gremlin

Read more about Chaos Engineering & Autonomous Optimization combined to maximize resilience to failure

Site Reliability Chats (Apr 13, 2022)

Apr 13, 2022 By Gremlin In Gremlin

In this episode, Julie and Jason cover recent outages of the Dutch NS trains, American Express, and the on-going, long-running incident at Atlassian. In positive news, they cover the acquisitions of Puppet by Perforce and Chaos Native by Harness, and Grafana Lab's series D funding.

View Video