Operations | Monitoring | ITSM | DevOps | Cloud

Top 12 Site Reliability Engineering (SRE) Tools

Ben Treynor Sloss, then VP of Engineering at Google, coined the term “Site Reliability Engineering” in 2003. Site Reliability Engineering, or SRE, aims to build and run scalable and highly available systems. The philosophy behind Site Reliability Engineering is that developers should treat errors as opportunities to learn and improve. SRE teams constantly experiment and try new things to enhance their support systems.

Nastel Recognized as Leader in Integration Infrastructure Management & Transaction Observability by GigaOm

“Nastel is uniquely placed when it comes to understanding the configuration information and message content of messaging middleware and integration infrastructure” — Saurabh Sharma, GigaOm Nastel Technologies, the world’s #1 i2M (Integration Infrastructure Management) company, today announced that it has been rated as a leader in GigaOm’s new Integration Infrastructure Management & Transaction Observability Sonar Report.

Best practices for effective asset tagging in 2022

When your company has hundreds or even thousands of physical assets, it’s essential to know where these assets are located and their operational status. Otherwise, you have to deal with outages and compliance issues that can create drastic business implications. From computer monitors to industrial equipment, tracking and controlling your assets with asset tags is critical to your company’s bottom line.

How to get One-click SCOM Root Cause Analysis

SCOM has incredible powers, but it’s not always easy to find the root cause of issues fast. And you definitely don’t get one-click SCOM root cause analysis. We’ve all been there. A business-critical server goes down and you don’t know why. Let’s imagine you had a dashboard showing the health statuses of all your server groups and you notice that the United States is showing as critical.

Introduction to reliability management

Ensuring your digital customer experiences are exceptional is a goal of any modern business. However, managing the reliability of ever more complex applications is a challenge. Developers are releasing new capabilities in fast-moving sprints and the business wants maximum velocity with minimal risk. SRE teams create a structure of continuous improvement that focuses on ensuring the application is reliable above all else.