Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Smartsheet's SRE Team Takes Center Stage As It Hits The 8M User Mark

Smartsheet was founded in 2005 with the mission of helping companies simplify and streamline how work is managed. Over three quarters of the Fortune 500 rely on Smartsheet. Through its enterprise platform for dynamic work, the platform aligns people and technology to help businesses move faster, drive innovation, and achieve more.

Are you an MS Teams shop? We've got you Covered with Blameless Incident Resolution

We have an exciting announcement. Blameless is providing early access to our Microsoft Teams integration. SRE and engineering teams can now resolve incidents faster without leaving the comfort of their favorite messaging tool. With the Blameless incident resolution product, Microsoft Teams users can now reduce toil in routine incident response processes through automation, codify processes with checklists, and craft retrospectives with the ‘add to timeline’ command.

How Lowe's meets customer demand with Google SRE practices

At Lowe’s, we’ve made significant progress in our multiyear technology transformation. To modernize our systems and build new capabilities for our customers and associates, we leverage Google’s SRE framework and Google Cloud, which helps us meet their needs faster and more effectively. With these efforts, we’ve been able to go from one release every two weeks to 20+ releases daily—about 20X more releases per month.

Error Budgets Explained (And How to Make One for Your Team)

Wondering what error budgets (EBs) are and how they are useful? We explain what they are, how they are defined, and how they can help your team. An error budget is the amount of acceptable unreliability a service can have before customer happiness is impacted. If a service is well within its budget, the developers can take more risks in their releases. If not, developers need to make safer choices.

The 7 SRE Principles [And How to Put Them Into Practice]

Whether you're just adopting SRE or optimizing your current processes, we can help. We’ll explain the 7 key principles of SRE and how to put them into practice. So, what are the SRE principles? The fundamental SRE principles are: SRE is a method that operates through principles. Instead of prescribing specific solutions, it guides you with best practices. These SRE principles help organizations decide what's best for them. Once you understand the principles, you can apply them in many areas.

What do site reliability engineers do?

Are you considering adopting SRE? We will explain the roles and responsibilities of an SRE team within your organization, and how to start building one. So what does an SRE team do? An SRE team is responsible for building software that improves the resiliency of systems, implementing fixes, responding to incidents, and automating processes whenever possible. Site reliability engineering is a holistic practice that incorporates various types of work.

Blameless Runbook Documentation is Now Generally Available!

At Blameless, our mission is to provide teams with the tools they need to operationalize SRE and embrace a culture of resilience. We help teams automate toil and adopt best practices across integrated incident management, comprehensive retrospectives, service level objectives, reliability insights, and more. We are very excited to announce that Blameless Runbook Documentation is now generally available for all customers.