Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

SRE Report 2023: Findings From the Field - Toil

Toil. Few other words have the same visceral impact for SREs as their four-letter nemesis: toil. Although pretty much everyone recognizes and agrees that toil is bad, it is a term that is frequently misused in colloquial use. In common English usage, toil is defined as “long strenuous fatiguing labor”. As a term of art in the SRE profession, “toil” has several very specific characteristics which distinguish it from other sorts of work which people spend time on.

Using Tagging and Routing Rules in Squadcast I Incident Classification I Event Tagging I Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. This video explains how to create Tagging rules for efficient Incident Classification.

Adding Incident Watchers in Squadcast | Incident Notifications and Updates | Squadcast

This video talks about Squadcast's Incident Watchers Feature. In Squadcast, any user/stakeholder can subscribe to an Incident and act as a Watcher for an incident. Incident Watchers can choose to receive notifications for all the updates of an incident. This allows any user/stakeholder to act as an observer of the incident, even if they are not active responders. You can customize your watch options for the incident and receive notifications only for those updates.

SRE Vs. DevOps: A Simple Breakdown Of The Differences

You know this already. Regardless of your size, you must keep up with technological developments in your industry — and, increasingly, in other industries, even those that seem unrelated. Embracing disruption can enable you to increase your market share, revenue, and profit margins. Delegating some development and operations responsibilities to Site Reliability Engineering (SRE) experts allows developers to innovate and create new solutions faster.

Webinar Recap: How Observability Impacts SRE, Development, and Security Teams

In today’s fast paced and constantly evolving digital landscape, observability has become a critical component of effective software development. Companies are relying more on and using machine and telemetry data to fix customer problems, refine software and applications, and enhance security. However, while more data has empowered teams with more insights, the value derived from that data isn’t keeping pace with this growth. So how can these teams derive more value from telemetry data?

Analytics in Squadcast | Visualize Team and Organization Level Analytics | MTTA MTTR | Squadcast

Analyzing incident data plays a key role to do better SRE. Squadcast's Analytics Dashboard helps you analyze the performance of your Organization/ Team, for a given time period. It also gives you more insight into past outages that affected your systems.
Sponsored Post

What are Network Operation Centers (NOC) and how do NOC teams work?

Modern-day markets are highly competitive and in order to foster stronger customer relations, we see businesses striving hard to be always available and operational. Hence, businesses invest heavily to ensure higher uptime and to have dedicated teams that constantly monitor the performance of an organization's IT resources. In this blog, we will explore what NOC teams are and why they are important.

Sponsored Post

Runbook Automation as a Baseline for Controllability and Observability

Some of the highest priorities for engineers - from NOC Engineers, DevOps & Site Reliability Engineers - are the automation and optimization of their production environments. Many companies today face tough challenges with their Network Operations Centers (NOCs) or production environments. These challenges fall into the hands of engineering teams.