In some respects, security and reliability are competing priorities. Security controls may reduce reliability, and responding to security incidents may require mission-critical systems to be paused or shut down until they're secure. The recent security incident involving CircleCI, however, shows that it's not always necessary to choose between prioritizing security or reliability.
So you’ve decided to take advantage of Site Reliability Engineering by hiring SREs for your company. Now, you have a second decision to make: Exactly how many SREs to hire. Do you need just one or two SREs? Or should you build a sprawling SRE team, with a dozen or more SREs on hand to support your organization’s reliability needs? The answers to these questions will, of course, vary; every business’s needs are different.
Mature start-ups and scale-ups create wonderful and challenging environments for Engineers. As the product they’re creating matures and the brand becomes a successful one, the user base generally starts growing, and, for some companies, in places they might not expect it to grow. As that happens, new challenges arise for Engineers. One of these challenges is pretty straightforward to guess. Basically having a particular product available throughout different regions of the world.
Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be “zero.” After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing actual incidents by as much as possible is a noble goal. However, it’s important to recognize that incidents aren’t an SRE’s number one enemy.
Site Reliability Engineers (SREs) have a considerable set of tasks to juggle no matter where they work or how long their company has had an SRE practice. But if you’re the very first SRE to join an organization – as many SREs are these days, given that the SRE trend is trickling down into smaller and smaller companies – you face a special group of challenges. You may find it difficult to get buy-in for SRE from other technical teams.
What happens when the tools and services you depend on to drive Site Reliability Engineering turn out to be susceptible to reliability failures of their own? That’s the question that teams at about 400 businesses have presumably had to ask themselves this month in the wake of a major outage in Atlassian Cloud.
To embed or not to embed: That is the question. At least, that’s one of the questions that companies have to answer as they decide how to implement Site Reliability Engineering. They can either embed SREs into existing teams, or they can build a new, separate SRE team. Both approaches have their pros and cons. The right strategy for your company or team depends, of course, on your needs and priorities.
Site Reliability Engineering (SRE) teams and Platform Engineering teams share similar goals -- like maximizing automation and reducing toil -- and similar methodologies. But they have different priorities, and use somewhat different tools to achieve them. What are SREs, what are platform engineers and how is each role similar and different? This article explains.
If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter. Which perspective is right?