Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

But It's Not Our Fault! When Third-party Incidents Affect Your Service

Very few SaaS products exist completely independently. Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. When these systems fail, it can leave you feeling pretty helpless. In some cases you might have fallback options, but oftentimes all you can do is wait for recovery and clean up the fallout.

Azure Monitoring Agent: Key Features & Benefits

In today's rapidly evolving digital landscape, businesses increasingly rely on cloud computing and infrastructure to support their operations. As organizations migrate their workloads to the cloud, robust monitoring and management tools are paramount to ensure optimal performance, security, and efficiency. In response to this demand, Microsoft Azure has introduced the Azure Monitoring Agent (AMA), a powerful and versatile solution designed to enhance the monitoring capabilities of Azure resources.

Splashing into Data Lakes: The Reservoir of Observability

If you’re a systems engineer, SRE, or just someone with a love for tech buzzwords, you’ve likely heard about “data lakes”. Before we dive deep into this concept, let’s debunk the illusion: there aren’t any floaties or actual lakes involved! Instead, imagine a vast reservoir where you store loads and loads of raw data in its natural format. Now, pair this with the idea of observability and telemetry pipelines, and we have ourselves an engaging topic.

Rootly Raises $12 Million from Renegade Partners, Google Gradient Ventures, & XYZ Ventures

We are excited to announce that we have raised a $12M round of financing led by Renegade Partners with participation from Google Gradient Ventures (Google’s AI-focused venture fund) and XYZ Ventures. This brings our total funding to date to $15.2M ($20M CAD) alongside our other existing investors Y Combinator and 8VC.

SRE in Transition: From Startup to Enterprise

"Startups are defined by “ship or die”. As a result, SRE teams at a startup should be focused on enabling product engineers to ship features as quickly as possible. As your startup transitions from “we’ll run out of money in the next 18 months” to “we have more than 1000 engineers”, how should the SRE organization evolve and provide the best value through that transition (including booting one up if you don’t have one)? I will discuss specific ways the organization needs to evolve to meet this challenge, how the SRE org can advocate for and support this change (both in direct actions and in “influence”), and how the overhang of startup technical and cultural debt can make this shift more challenging (but also more necessary).

Tools and Trends in Site Reliability Engineering according to Gartner's 2023 Hype Cycle

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, report. This blog reviews the future of site reliability engineering based on Gartner’s Hype Cycle. Additionally, the OnPage team is pleased that Gartner mentioned OnPage as a sample vendor in the Automated Incident Response category.