How to fail with Serverless Jeremy Daly Failover Conf 2020

Gremlin

May 5, 2020

This talk was presented at Failover Conf on April 21, 2020.

Everything fails all the time. Knowing how to deal with these failures in serverless applications becomes essential to building resilient, highly-available systems. In traditional monolithic applications, catching errors and handling retries is relatively straightforward. But as our systems become more distributed, we now have multiple (often asynchronous) components processing events from several sources, all with vastly different retry behaviors and failure mechanisms. Utilizing old patterns can cause errors to get swallowed, creating brittle, unreliable systems that are difficult to debug and hard to maintain.

In this talk, we’ll explore the built-in tools and processes that AWS has in place to appropriately deal with failures in distributed serverless applications. We’ll discuss retry behaviors and strategies for dealing with errors in: