Operations | Monitoring | ITSM | DevOps | Cloud

Chaos Engineering

How Detected Risks helps you find reliability risks in minutes-without running any tests

This video showcases Gremlin's Detected Risks feature. Detected risks are high-priority reliability concerns that Gremlin automatically identifies in an environment. These include misconfigurations, bad default values, and reliability anti-patterns. Gremlin prioritizes these risks based on severity and impact, giving instantaneous feedback on risks and action items to improve the reliability and stability of each service.

Four Pillars of a Best-in-Class Reliability Program

Reliability impacts every organization, whether you plan for it or not. Leading companies take matters into their own hands and get ahead of incidents by building reliability programs. But since many of these programs are still nascent, how do you know what good looks like? Of course, the right tools and technology that can enable your team to uncover reliability risks before they impact users play an important role. But improving reliability goes beyond technology.

Announcing the Gremlin Enterprise Chaos Engineering Certification (GECEC) program

We knew Chaos Engineering was in high demand when we first launched the Gremlin certifications in 2021. But we had no idea our Chaos Engineering certification programs would be such a success. There’s a reason: the market is looking for professionals who know how to wield Chaos Engineering well, and Gremlin's certification has become the gold-standard to learn the principles of Chaos Engineering and demonstrate proficiency.

Reliability Best Practices: How Gremlin Uses Gremlin

Ensuring software availability is essential for any SaaS company—including Gremlin. To do that, our teams need to identify the reliability risks hiding in our systems. That’s why our development, platform, and SRE teams use Gremlin regularly to perform Chaos Engineering experiments, run reliability tests, and track the reliability of our systems against our standards. Along the way they’ve picked up a thing or two about how to find and fix reliability risks with Gremlin.

Understanding Chaos Engineering and its Benefits

In today's fast-paced technological landscape, ensuring the resilience and dependability of systems is crucial. This is where Chaos Engineering comes in, transforming how organizations approach system testing and fortification. Chaos Engineering helps find vulnerabilities that could go undetected under normal circumstances by purposefully introducing controlled interruptions and failures.

How to Show Reliability Results to Your Organization

Building momentum for a reliability program can be tough. Improving reliability takes time, effort, and resources. But when everything from launching new features to improving security demands those same resources, it can be a struggle to get the buy-in you need to address reliability risks. And it makes sense! If a team spends time patching a known security bug or creating a new feature, they have a clear demonstration of the value created.

Chaos Engineering 2023 with Chaos Mesh

We've seen a tremendous transition in the architecture of our systems over the years, from basic, linear systems to increasingly sophisticated, non-linear systems. We've moved away from monolithic programs, where a single person could comprehend the entire operation of a system, and toward a distributed world dominated by a microservices design.