Latest Posts

Gartner: tips for improving reliability

Jun 6, 2022 By Andre Newman In Gremlin

In their report titled “IT Resilience — 7 Tips for Improving Reliability, Tolerability and Disaster Recovery”, Gartner presents seven strategies for improving the resilience posture of your critical systems. These recommendations range from how to get started, to identifying IT hazards and risks to reliability, to capturing metrics and translating them into business value. In this blog, we’ll take a high-level look at the report and summarize some of its key findings.

Read Post

Gremlin

Read more about Gartner: tips for improving reliability

Podcast: Break Things on Purpose | KubeCon, Kindness, and Legos with Michael Chenetz

May 31, 2022 By Jason Yee In Gremlin

In this episode, we chat with Cisco’s head of developer content, community, and events, Michael Chenetz. We discuss everything from KubeCon to kindness and Legos! Michael delves into some of the main themes he heard from creators at KubeCon, and we discuss methods for increasing adoption of new concepts in your organization. We have a conversation about attending live conferences, COVID protocol, and COVID shaming, and then we talk about how Legos can be used in talks to demonstrate concepts.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | KubeCon, Kindness, and Legos with Michael Chenetz

Podcast: Break Things on Purpose | Dan Isla: Astronomical Reliability

May 17, 2022 By Jason Yee In Gremlin

It’s time to shoot for the stars with Dan Isla, VP of Product at itopia, to talk about everything from astronomical importance of reliability to time zones on Mars. Dan’s trajectory has been a propulsion of jobs bordering on the science fiction, with a history at NASA, modernizing cloud computing for them, and loads more. Dan discusses the finite room for risk and failure in space travel with an anecdote from his work on Curiosity.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Dan Isla: Astronomical Reliability

How Gremlin runs a GameDay

May 10, 2022 By Sydney Lesser In Gremlin

You might be familiar with GameDays at this point. From watching our Introduction to GameDay webinar, viewing our Demo video, and reading our tutorial, you’ve probably learned that GameDays were created with the goal of increasing reliability by purposely creating major failures on a regular basis. Better yet, perhaps your own team has run a GameDay and learned something new about their services’ behavior during failure scenarios.

Read Post

Gremlin

Read more about How Gremlin runs a GameDay

Podcast: Break Things on Purpose | Natalie Conklin: Learning to Embrace Change

May 3, 2022 By Julie Gunderson In Gremlin

Natalie Conklin, tamer of chaos and Head of Engineering here at Gremlin, joins us to talk about embracing change, working alongside each other, and building more reliable systems. Natalie has a talk coming up at DevOpsDays Boise which she has titled “Embracing Change Fearlessly.” Her talk is oriented around enabling teams to take calculated risks and having the guts to take those risks. Natalie spent time working in India, which helped solidify her “fearlessly” philosophy.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Natalie Conklin: Learning to Embrace Change

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Apr 19, 2022 By Jason Yee In Gremlin

For this episode we’re continuing to “Build Things on Purpose” with JJ Tang, co-founder of Rootly, who joins us to talk about incident response, the tool he’s built, and his many lessons learned from incidents. Rootly is aiming to automate some of the more tedious work around incidents, and keeping that consistency. JJ chats about why he and his co-founder built Rootly, and the problems they’re trying to fix and eliminate when it comes to reliability.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Chaos Engineering & Autonomous Optimization combined to maximize resilience to failure

Apr 14, 2022 By Kyle McMeekin In Gremlin

Today’s enterprises are struggling to cope with the complexities of their environments, technologies, and applications. On top of these challenges, they face faster release rates, and the need to always deliver the highest level of performance and availability to end-users, at the lowest possible cost.

Read Post

Gremlin

Read more about Chaos Engineering & Autonomous Optimization combined to maximize resilience to failure

Podcast: Break Things on Purpose | Elizabeth Lawler: Creating Maps for Code

Apr 5, 2022 By Jason Yee In Gremlin

For this episode of “Build Things on Purpose” we are joined by Elizabeth Lawler, founder of AppLand, the creators of AppMap. Elizabeth is here to chat about the challenges of building modern, complex software and the tool that she has built that serves as a “Google maps for code” for developers. AppMap is designed to show in a more visually driven way to help clarify, in real time, writing code.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Elizabeth Lawler: Creating Maps for Code

Getting started with DNS attacks

Mar 31, 2022 By Andre Newman In Gremlin

Whenever an online service goes down, you're likely to hear three words: "it was DNS!" Blaming DNS might be a running joke among network admins and engineers, but it's one rooted in experience. DNS problems are known for causing massive, Internet-wide outages such as the 2021 Akamai outage that temporarily made the websites for Delta Air Lines, American Express, Airbnb, and others unreachable.

Read Post

Gremlin

Read more about Getting started with DNS attacks

Podcast: Break Things on Purpose | Chris Martello: Day of Darkness

Mar 22, 2022 By Julie Gunderson In Gremlin

Dad jokes lead the way in this episode as we interview Chris Martello, manager of application performance at Cengage. Chris is a wearer of many testing hats, but his passion is chaos and breaking things on purpose. Chaos was a natural fit for Chris with his background as a middle school science teacher, so when he made the jump to tech chaos engineering was a natural fit.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Chris Martello: Day of Darkness

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Gartner: tips for improving reliability

Podcast: Break Things on Purpose | KubeCon, Kindness, and Legos with Michael Chenetz

Podcast: Break Things on Purpose | Dan Isla: Astronomical Reliability

How Gremlin runs a GameDay

Podcast: Break Things on Purpose | Natalie Conklin: Learning to Embrace Change

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Chaos Engineering & Autonomous Optimization combined to maximize resilience to failure

Podcast: Break Things on Purpose | Elizabeth Lawler: Creating Maps for Code

Getting started with DNS attacks

Podcast: Break Things on Purpose | Chris Martello: Day of Darkness

Monthly Archive

Follow Us