In a world where Software as a Service (SaaS) products are integral to daily life, maintaining uninterrupted service for end-users is paramount. However, stuff happens. When it does, our most valuable response (other than restoring service ASAP) is to review the series of events that led up to the incident and learn from them. On August 25th, 2023, at 7:02 AM NZT, Raygun experienced a significant incident that impacted our API ingestion cluster, leading to an outage lasting approximately 1 hour and 15 minutes. While this wasn't fun for anyone involved, this incident did prove to be a valuable learning experience, shedding light on the importance of infrastructure management and resilience.
Understanding the impact of each of your deployments is crucial, especially as they become increasingly frequent. Chances are, your team is either aiming to increase shipping velocity or has already started deploying "continuously" (which is to say, multiple times a day). The biggest tech teams at the likes of Amazon and Google deploy thousands of times daily, and Atlassian has found that 75% of enterprise DevOps teams call deployment frequency their most important success criteria. And while CD comes with a host of well-established benefits, it also introduces a heightened risk of introducing new errors and issues.
Object-oriented (not orientated!) design is a fundamental principle of modern software engineering, a crucial concept that every developer needs to understand and employ effectively. Software design patterns like object-oriented design serve as universal solutions to common problems, across a range of instances and domains. As software engineers advance in their careers, they actually often start using these patterns instinctively, even without knowing it.