SRE

The latest News and Information on Service Reliability Engineering and related technologies.

8 Tips to incorporate the voice of the customer in your story grooming/sprint planning

Jul 18, 2023 By Anjali Udasi In Zenduty

Creating successful products and projects goes beyond just great ideas and flexible processes. It's about truly understanding and listening to your customers.Attentively listening to their wants and needs unlocks invaluable insights that can revolutionize your story planning and project execution. In this blog, we'll look at easy but powerful tips to use the customer's input during story planning.

Read Post

Zenduty

Read more about 8 Tips to incorporate the voice of the customer in your story grooming/sprint planning

Take back control of your Monitoring

Jul 18, 2023 By Last9 In Last9

The challenges in the monitoring world are known widely. We all know about these problems, what they are, and why they are important. While each one of the problems has its own solution, it all boils down to one thing – COST. How do we balance the tradeoffs without worrying about the huge costs of solving these challenges? For high-precision monitoring and observability, you need efficient and high-precision control levers. Take back control of your Monitoring with Levitate - a managed time series data warehouse.

View Video

Last9

Read more about Take back control of your Monitoring

What is OpenTelemetry Collector

Jul 17, 2023 By Last9 In Last9

What is OpenTelemetry Collector, Architecture, Deployment and Getting started.

Read Post

Last9

Read more about What is OpenTelemetry Collector

How JCB is leveraging SRE to lead a successful digital transformation

Jul 15, 2023 By Shimpei Sasano In Google Operations

How JCB improves team structure, risk management, and application and platform development.

Read Post

Google Operations

Read more about How JCB is leveraging SRE to lead a successful digital transformation

InfluxDB vs. Thanos

Jul 14, 2023 By Prathamesh Sonpatki In Last9

InfluxDB vs Thanos: Overview, Pros and Cons, and Differences.

Read Post

Last9

Read more about InfluxDB vs. Thanos

What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Jul 14, 2023 By incident.io In Incident.io

Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.

Read Post

Incident.io

Read more about What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Share highly customizable Blameless Retrospectives as ServiceNow Problems

Jul 13, 2023 By Nicolas Philip In Blameless

For many organizations, ServiceNow is a crucial platform to run and scale your organization across all departments. Many organizations’ engineering teams have been relying on ServiceNow Incident and Problem Management. Despite that, many have been experiencing a growing volume of incidents hindering their ability to scale not only their incident response but also their retrospective operations, potentially compromising their data governance and compliance requirements.

Read Post

Blameless

Read more about Share highly customizable Blameless Retrospectives as ServiceNow Problems

Understanding Chaos Engineering and its Benefits

Jul 12, 2023 By Anjali Udasi In Zenduty

In today's fast-paced technological landscape, ensuring the resilience and dependability of systems is crucial. This is where Chaos Engineering comes in, transforming how organizations approach system testing and fortification. Chaos Engineering helps find vulnerabilities that could go undetected under normal circumstances by purposefully introducing controlled interruptions and failures.

Read Post

Zenduty

Read more about Understanding Chaos Engineering and its Benefits

26 DevOps Automation Tools that SaaS Loves in 2023 | Blameless

Jul 12, 2023 By Emily Arnott In Blameless

DevOps is a term combining “development” and “operations”. It involves the use of tools and processes to minimize the time and effort spent on software creation and maintenance. Many DevOps technologies use automation to reduce manual tasks. These DevOps automation tools sometimes use AI-based technology to remove human-based operations, or simpler scripting and processing. This increases speed in feedback and performance between development and operations departments.

Read Post

Blameless

Read more about 26 DevOps Automation Tools that SaaS Loves in 2023 | Blameless

Improve Visibility and Capture More Data with Triage Incidents

Jul 12, 2023 By Ashley Sawatsky In Rootly

As new incidents emerge, there are often many unknowns about the size, severity, and cause of the problem. Sometimes it’s not clear if the problem is an incident at all. That’s where introducing a triage stage to your incident management process can help. In this post, we’ll look at the benefits of adding a triage layer to your incident management, and how Rootly’s Triage feature allows you to seamlessly transition from triage to real incident (or false alarm).

Read Post