Operations | Monitoring | ITSM | DevOps | Cloud

September 2024

Interpreting your reliability test results

Gremlin’s default suite of reliability tests analyzes critical functions of modern services: scalability, redundancy, and resilience to dependency failures. Services that pass this suite of tests can be trusted to remain available during unexpected incidents. But what happens when a service fails a test? How do you take failed test results and turn them into actionable insights? This blog aims to answer that question.

Office Hours: Get better reliability on AWS with our new release

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Cloud platforms make it easier than ever to deploy massively scalable, distributed workloads, but this is a double-edged sword. There are reliability challenges unique to the cloud that didn’t exist before. Failed migrations, recurring incidents, and reliability toil take their toll.

Release Roundup August 2024

Over the past year, the Gremlin team has focused on giving you more tools to adapt Gremlin to your organization’s reliability needs. We started with customizable reliability tests, and now, we’ve released customizable role-based access controls (RBAC). We’ve also made it easier to target specific availability zones when running Failure Flags experiments, and to run experiments behind a proxy. Keep reading to learn more! ‍