The SRE Report 2025: Highlighting Critical Trends in Site Reliability Engineering
NEW YORK, USA, January 13, 2025 — Catchpoint, the leader in Internet Performance Monitoring (IPM), today unveiled its annual site reliability engineering (SRE) report for 2025. The industry-leading report offers unique insights from over 300 professionals spanning the global IT and reliability community, including engineers, managers, architects, and executives.
Download the 2025 SRE Report https://www.catchpoint.com/asset/2025-sre-report
Now in its seventh year, the SRE Report is widely considered the authentic, independent voice of the reliability community and underscores the role of SRE as an indispensable practice in maintaining high-performing, resilient digital services and applications. This year's report highlights valuable insights into the challenges and opportunities facing SRE teams in an era marked by rapid technological advancement and escalating performance expectations.
“Success starts with individuals owning their role in the bigger picture, and that starts with embracing SRE as more than a technical enhancement,” said Mehdi Daoudi, CEO and co-founder of Catchpoint. “When teams understand how their work drives outcomes, it becomes easier to align around the opportunities that matter and the steps to seize them, and what’s a major concern this year is that organizations are feeling pressured to prioritize release schedules over reliability.”
Key findings from the report include:
- Slow is the new down: 53% of organizations agree that poor performance is as harmful as downtime, elevating user experience to a key reliability metric.
- Toil levels rise despite AI: After five years of steady decline, the median reported percentage of work spent on toil has increased to 30% from 25% in 2024, raising questions about AI’s impact on daily workloads.
- Organizational priorities under pressure: Over two thirds of respondents acknowledge frequently feeling pressured to prioritize release schedules over reliability, reflecting the ongoing struggle between agility and stability.
- Multiple monitoring tools are the norm: Most organizations use between 2-10 monitoring or observability tools, emphasizing a “value over cost” mindset for effective monitoring across technology stacks.
- AI training in demand, but time-constrained: 30% of respondents prioritized technical training on AI. As the second most selected sentiment, this highlights a strong desire for upskilling, even as the top sentiment (37%) reflects a cautious approach to AI implementations.
- Incidents as a certainty: 40% of respondents reported handling between 1 and 5 incidents in the last 30 days. Notably, incident response is a shared responsibility across all levels, with higher-level managers as involved as individual contributors.
- Continued misalignment on reliability priorities: While overall responses paint a positive picture of reliability practices, significant differences emerge when analyzed by managerial responsibility, highlighting a gap in alignment on priorities and approaches.
“What was most eye opening from our report findings this year was that, for most teams, it seems the burden of operational tasks has grown for the first time in five years,” said Leo Vasiliou, Director, Product Marketing at Catchpoint and author of the SRE Report. “The expectation was that AI would reduce toil, not exacerbate it.”
Methodology
The 2025 SRE Report is based on insights gathered from the annual SRE Survey, which was open for six weeks during July and August 2024. The survey received 301 responses from professionals across the globe, representing a wide range of roles and levels of managerial responsibility within reliability engineering.
The majority of respondents were located in North America (68%), followed by Europe (16%) and Asia (11%). Company sizes varied, with 25% of respondents working at organizations with 1,001–10,000 employees and 15% at companies with 10,001–100,000 employees. This diversity ensures that the report captures a broad and comprehensive perspective on the state of site reliability engineering practices worldwide.
About Catchpoint
In today’s exacting digital age, performance is paramount. The top online retailers, Global2000, CDNs, cloud service providers, and xSPs all rely on Catchpoint to ensure high performance and digital resilience by catching issues across the Internet Stack before they impact their customers, workforce or digital experiences. Catchpoint’s Internet Performance Monitoring (IPM) suite offers Internet Synthetics, RUM, BGP, Tracing, performance optimization, high-fidelity data, and flexible visualizations with advanced analytics derived from the world’s largest, most detailed, active observability network.
Contacts
Emily Fang, Greenough Communications
Catchpoint@Greenoughagency.com