Operations | Monitoring | ITSM | DevOps | Cloud

Round Robin escalation policies: do's and don'ts

The concept of Round Robin comes from sports. And it has nothing to do with anyone called Robin, but the french word ruban (ribbon). In a Round Robin tournament, all participants face each other by taking turns. When applied to on-call schedules, a Round Robin escalation policy means that responders assigned to a level will take turns responding to alerts. When is this strategy useful and when isn’t?

How Meta and Google use AI to improve incident response

The world population in 2024 is approximately 8.12 billion people. Of these, 4.3 billion people use Google regularly, while 3.74 billion are active users on Meta's platforms. Any disturbance involving these tech giants will surely make headlines, as seen in the recent Google’s Unisuper incident. The scale of these tech companies brings fascinating challenges in every aspect of their operations, including incident response.

What we can learn from Google's UniSuper incident comms

Earlier this month, an inadvertent misconfiguration in an internal tool used by Google Cloud resulted in the deletion of a user’s GCVE Private Cloud. The user in question? UniSuper Australia — a $125 billion Australian pension fund with over 600,000 users. In this post, Ashley reflects on the communications shared and what we can learn from them.

How Incidents Foster Leadership

To become battle-tested, you need to go through battles, not just read books or mentor newcomers. Both are helpful but the stakes are low. On the other hand, high stake jobs, such as running a big project or managing a team, are hard to get when you lack experience. So how can we solve this dilemma? Enter incident response.