Operations | Monitoring | ITSM | DevOps | Cloud

November 2024

Integrating Gremlin with your observability tools

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. To get the most value out of Chaos Engineering and reliability testing, you need a way to observe your service’s behavior. Observability tools offer insight into how your systems are performing, but observability on its own isn’t enough. You need a way to monitor your systems while testing their reliability so you can determine whether your service passed or failed a test.

Building Resilience from Architecture to Production with AWS & Gremlin

Unreliable software can have a painful impact on your customers and your business—something we’ve all seen and felt during high-profile outages. And while building on the cloud with AWS unlocks improved scaling and reliability capabilities, the complexity of modern distributed systems can potentially introduce outage-causing reliability risks. How can you be sure your systems are resilient to failure when they’re based on complex architecture, built by hundreds of teams, and are being updated almost constantly?

How reliability engineering can verify disaster recovery plans

Disaster recovery plans have always been a crucial part of businesses—especially essential services like banks. These plans help keep your business up and running during a disaster or extreme scenario so you can be there for your customers when they need you the most.