The latest News and Information on Service Reliability Engineering and related technologies.
Any engineered system does not guarantee 100% uptime. There are bound to be some unforeseen system failures that cause downtime for the customers or create a poor customer experience. It is, therefore, best practice to take into account a margin for plausible failures. An error budget is this margin of error that the customer is informed about beforehand to secure tolerance during system failure for a decided number of hours.
A Service Level Agreement (SLA) consists of many service commitments. It is an essential part of a contract to outsource software development or software support between two or more parties, specifying the duties and the quality and type of service a company would provide for a fee to a customer.
Monitoring is an essential function of enterprise SRE teams and a critical component of business service deliverability. Its importance has only grown as enterprise environments and technologies continue to evolve at a rapid pace. Unfortunately, traditional monitoring is no longer enough.
Starting small and scaling your systems to serve billions of requests per month is never an easy path, so how do you build an infrastructure whilst making the right decisions and compromises for your services? Choosing the right technology stack and ensuring your CI/CD pipeline is reliable are two key steps towards this which we will explore.