How Nagarro used Gremlin to prevent a cascading failure outage
Check out how Nagarro used Gremlin to help a client prevent a cascading failure before it caused an outage. "Once we had tested a critical software that was doing millions of online transactions on a daily basis. The design was fail safe, providing redundancy on critical services by having multiple instances deployed on different VMs. What we did was we ran a virtual machine terminate test to bring down an instance of that service with the hypothesis that it will recover automatically. Well, the service did recover automatically, but the system saw a cascading failure.