Operations | Monitoring | ITSM | DevOps | Cloud

Gremlin

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

For this episode we’re continuing to “Build Things on Purpose” with JJ Tang, co-founder of Rootly, who joins us to talk about incident response, the tool he’s built, and his many lessons learned from incidents. Rootly is aiming to automate some of the more tedious work around incidents, and keeping that consistency. JJ chats about why he and his co-founder built Rootly, and the problems they’re trying to fix and eliminate when it comes to reliability.

Chaos Engineering & Autonomous Optimization combined to maximize resilience to failure

Today’s enterprises are struggling to cope with the complexities of their environments, technologies, and applications. On top of these challenges, they face faster release rates, and the need to always deliver the highest level of performance and availability to end-users, at the lowest possible cost.

Podcast: Break Things on Purpose | Elizabeth Lawler: Creating Maps for Code

For this episode of “Build Things on Purpose” we are joined by Elizabeth Lawler, founder of AppLand, the creators of AppMap. Elizabeth is here to chat about the challenges of building modern, complex software and the tool that she has built that serves as a “Google maps for code” for developers. AppMap is designed to show in a more visually driven way to help clarify, in real time, writing code.

Getting started with DNS attacks

Whenever an online service goes down, you're likely to hear three words: "it was DNS!" Blaming DNS might be a running joke among network admins and engineers, but it's one rooted in experience. DNS problems are known for causing massive, Internet-wide outages such as the 2021 Akamai outage that temporarily made the websites for Delta Air Lines, American Express, Airbnb, and others unreachable.

Getting Started with Gremlin Attacks

Gremlin provides a variety of ways to test the resilience of your systems, which we call "attacks". Running different attacks lets you uncover unexpected behaviors, validate resilience mechanisms, and improve the overall reliability of your systems and services. This ebook explains each of Gremlin's attacks in complete detail, including what each attack does, how it impacts your systems, and the technical and business objectives the attack helps solve.