Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

SRE Trends from AWS re:Invent 2022

In November/December 2022 I attended AWS re:Invent in Las Vegas. It was certainly an experience for this small town kid from New Zealand, and one that I took a lot away from. While I was at the conference, I took the time to walk around and take notes. In this article I will share the trends that I observed which I think will have an impact on SRE work in 2023 and beyond, including: ...and others.

How to talk to your executive leadership team about reliability

Product reliability requires investment from all areas of the business. Technology leaders must effectively communicate the implications of service reliability to the rest of the organization. As a leader, how do you prove that a more reliable product is critical to success? Experts from BetterCloud, Machinify and Blameless come together to discuss how to talk to your executive leadership team about reliability in this webinar.

How to talk to your executive leadership team about reliability

Product reliability requires investment from all areas of the business. Technology leaders must effectively communicate the implications of service reliability to the rest of the organization. As a leader, how do you prove that a more reliable product is critical to success? Experts from BetterCloud, Machinify and Blameless come together to discuss how to talk to your executive leadership team about reliability in this webinar.

Understanding Site Reliability Engineering (SRE)

Success in this modern age of digital services and operations is found when businesses are able to prioritize effective digital processes. Because of this, IT teams are constantly looking for ways to improve their IT operations by making them efficient, reliable, and scalable. One way this is accomplished is through site reliability engineering (SRE). LinkedIn listed SRE as the 21st fastest growing job in the U.S. in January 2022. What is SRE, and why is it in such high demand?

Incident Management Tools - Do I Even Need Them?

Software is hard… Maintaining software reliability is harder than it used to be. Software systems have grown dramatically in complexity, as they’re applied in a wider range of applications and environments. Many of which have become fundamental to the everyday function of our society. On the other hand, the pace of software development and release is also faster than ever. Innovating new features faster than competitors has become the key to success in a rapidly-changing market.

Why SREs need better visibility, not more tools

As a site reliability engineer (SRE), you juggle a lot of moving targets. You keep tabs on your operational environment’s health and maximize service levels, all while trying to scale your business and exceed client expectations. To hold it all together, you’ve likely implemented a hybrid cloud strategy to keep a watchful eye over everything: your on-premises infrastructure, containers, and numerous cloud deployments.

Introducing Levitate: 'uplifting' your metrics woes because self-management sucks like gravity

Managing your own time series database is painful. We’ve moved from servers to services, and yet, monitoring metrics data is primitive. Our managed time series database powers mission-critical workloads for monitoring, at a fraction of the cost.

SRE Report 2023: Are we Aligned? Yes. No. Maybe.

Each year of the SRE Report, there’s a trend or anti-pattern that leaps out and makes us pause and reflect. Last year, for example, we found a huge drop in global toil levels. With the whole world working from home for a full year, it made sense that global toil levels would drop, right? But this year, despite the great reopening underway, toil levels dropped even further - it's a paradox, one which no doubt will require its own scrutiny.