Site Reliability Chats (Apr 20, 2022)
In this episode Julie and Jason share updates on the Atlassian outage, a new incident at Cerner, and problems at the IRS. They also cover post-incident investigations from Cloudflare and Datadog.
Atlassian incident report: https://www.atlassian.com/engineering/april-2022-outage-update
Cloudflare incident report: https://blog.cloudflare.com/pipefail-how-a-missing-shell-option-slowed-cloudflare-down/
Datadog incident report: https://www.datadoghq.com/blog/engineering/grpc-dns-and-load-balancing-incident/#why-were-clients-sending-so-many-syn-requests