What is SRE?
Site Reliability Engineering (SRE) is a practice for managing the reliability of systems that began at Google in the early 2000s. Ben Treynor Sloss from Google started the first SRE team and coined the name.
The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Site Reliability Engineering (SRE) is a practice for managing the reliability of systems that began at Google in the early 2000s. Ben Treynor Sloss from Google started the first SRE team and coined the name.
Most companies today compile a set of metrics for their product teams to regularly report on to the company management. This includes a variety of product performance metrics(usage frequency, churn rate, NPS, etc.). But a lot of them struggle a bit with product discovery activities. So how do your track discovery?
I can tell you the day I knew I would be a Systems Administrator (the term SRE hadn’t been invented yet.) My Linux professor, a brilliant engineer at NASA, said: "The best system administrators are the laziest." He went on to qualify that statement but I had stopped listening. My fate was sealed.
If you’ve been following the U.K. healthcare landscape, you would know that the country has been considering replacing pagers for the longest time. This may soon materialize, partly accelerated by the challenges that doctors are facing during the COVID-19 pandemic. The pager replacement initiative not only signifies a pivotal shift from the aging infrastructure, but it also indicates how pagers have failed to thrive in today’s unprecedented times.
We are going through an incredibly difficult time of uncertainty, lockdowns, cutbacks, and even fear. Taking this time to optimize and rethink the way we do business is essential in ensuring we get back on track and return even stronger than before. Most of us have been working from home for months now and, in some cases, there is no end in sight. How are you and your operations holding up? Are you able to work, maintain, and control your infrastructure?
Constantly talking to your users about their business problems and incorporating those solutions is key to the success off your product and company. There are many ways to incorporate the voice of your users into your product planning. Formulate an experience brief that’s less than 2 pages, or a 5-minute clip of user interviews. The best is to have devs in the interviews and discovery activities with you as well.
PagerDuty sat down with J. Paul Reed, a Senior Applied Resilience Engineer at Netflix, for an Ask Me Anything (AMA) to discuss best practices around postmortems. Reed is a prominent speaker and advocate of DevOps and operations complexity, and has over 15 years of experience in release engineering. His background in tech, along with his previous work at companies like Mozilla and VMware, give him a unique perspective into the inner workings of innovative organizations.