The latest News and Information on Service Reliability Engineering and related technologies.
Our 2020 SRE Report is ready! We launched the SRE survey 2020 this January with the goal of understanding the current state of SRE. The survey covered a range of topics including: As we neared the end of the survey period, the SRE community was in the midst of a sudden change. SRE teams were forced to migrate to all-remote IT. We realized we would not be able to provide an accurate analysis without considering this shift in how SRE teams were operating in this new environment.
It’s 3 AM and you are roused out of sleep by the dull buzzing of your phone in the other room. Some sort of emergency, you conclude as you fumble with the lockscreen. There it is: an alert that the API governing user registration is acting up. When we think about the lag between time of incident and time to respond, it’s not just about how long the system went down. How long it physically takes us to respond to the problem also contributes to lost downtime.