Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

8 Tips to incorporate the voice of the customer in your story grooming/sprint planning

Creating successful products and projects goes beyond just great ideas and flexible processes. It's about truly understanding and listening to your customers.Attentively listening to their wants and needs unlocks invaluable insights that can revolutionize your story planning and project execution. In this blog, we'll look at easy but powerful tips to use the customer's input during story planning.

Take back control of your Monitoring

The challenges in the monitoring world are known widely. We all know about these problems, what they are, and why they are important. While each one of the problems has its own solution, it all boils down to one thing – COST. How do we balance the tradeoffs without worrying about the huge costs of solving these challenges? For high-precision monitoring and observability, you need efficient and high-precision control levers. Take back control of your Monitoring with Levitate - a managed time series data warehouse.

What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.

Share highly customizable Blameless Retrospectives as ServiceNow Problems

For many organizations, ServiceNow is a crucial platform to run and scale your organization across all departments. Many organizations’ engineering teams have been relying on ServiceNow Incident and Problem Management. Despite that, many have been experiencing a growing volume of incidents hindering their ability to scale not only their incident response but also their retrospective operations, potentially compromising their data governance and compliance requirements.

Understanding Chaos Engineering and its Benefits

In today's fast-paced technological landscape, ensuring the resilience and dependability of systems is crucial. This is where Chaos Engineering comes in, transforming how organizations approach system testing and fortification. Chaos Engineering helps find vulnerabilities that could go undetected under normal circumstances by purposefully introducing controlled interruptions and failures.

26 DevOps Automation Tools that SaaS Loves in 2023 | Blameless

DevOps is a term combining “development” and “operations”. It involves the use of tools and processes to minimize the time and effort spent on software creation and maintenance. Many DevOps technologies use automation to reduce manual tasks. These DevOps automation tools sometimes use AI-based technology to remove human-based operations, or simpler scripting and processing. This increases speed in feedback and performance between development and operations departments.

Improve Visibility and Capture More Data with Triage Incidents

As new incidents emerge, there are often many unknowns about the size, severity, and cause of the problem. Sometimes it’s not clear if the problem is an incident at all. That’s where introducing a triage stage to your incident management process can help. In this post, we’ll look at the benefits of adding a triage layer to your incident management, and how Rootly’s Triage feature allows you to seamlessly transition from triage to real incident (or false alarm).