We’ve been monitoring 100,000’s of serverless backend components for 2+ years at Dashbird. In our experience, Serverless infrastructure failures boil down to: These isolated faults become causes of failure due to dependencies in our cloud architectures (ref. Difference of Fault vs. Failure). If a serverless Lambda function relies on a database that is under stress, the entire API may start returning 5XX errors.
Dashbird, a platform for serverless application monitoring, has raised $2.1 million in a seed round. The investment was led by Paladin Capital Group, with participation from Passion Capital, Icebreaker.vc and Lemonade Stand.
We recently wrote about why serverless applications fail and how to design resilient architectures. Being able to detect early-stage failure indicators can be invaluable. With proper monitoring, developers move from waiting for the system to crash and adopt a more proactive attitude in managing resource allocation and architecture design to avoid bottlenecks and performance degradation.
Cloud applications don’t just run flawlessly by way of magic. Many things can go wrong, and rest assured some will go wrong at one point. For small teams, this can be cumbersome and take a toll at the development speed. A monitoring system will detect these issues on behalf of the development team, so that they can act accordingly. At Dashbird, we think there’s much more to it, though, than just detecting and alerting issues, especially for small teams of developers.
No matter how careful developers are or how comprehensive tests are applied before deployment, there will always be some level of issues to deal with in production. When it comes to managing issues and ensuring application quality, two main metrics should be on our radar: time to discover and time to resolve issues.
That is a common question I see among developers. Most of the time, nobody cares about system logs. But when things go south, we absolutely need them. Like water in the desert, sometimes! At Dashbird, we have a list of criteria compiled to determine a reasonable retention policy for application logs. There is no one-size-fits-all, though. The analytical dimensions below will give a relative notion of how long the retention period should be.