Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Maximize efficiency with Terraformer: Manage Squadcast resources via IaC

Ever since Terraform was first launched by HashiCorp, infrastructure teams have been quick to leverage its functionality. Because deploying infrastructure via code became so much easier and error-free. This surely became a great way to deploy new infrastructure with custom configurations, but what about managing cloud infrastructure that is already defined? Can Terraform be used to make changes to them? Or can it be used to deploy the same configurations to new environments?

Assessing Observability Maturity at Danske Bank

In order to ensure reliability, IT operations teams today require a deeper understanding of systems than monitoring, along, can provide. In this session, you'll hear insights from Danske Bank about how their observability journey started, the obstacles encountered along the way, what they've achieved in observability so far and, finally, how they measure the maturity of their observability practice.
Sponsored Post

SRE Best Practices

Site Reliability Engineering (SRE) is a practice that emerged at Google because of its need for highly reliable and scalable systems. SRE unifies operations and development teams and implements DevOps principles to ensure system reliability, scalability, and performance. There's plenty of documentation on tactics for adopting automation and implementing infrastructure as code, but practical ops-focused SRE best practices based on real-world experience are harder to find. This article will explore 6 SRE best practices based on feedback from SREs and technical subject matter experts.

Introduction to Kubernetes Imperative Commands

Kubernetes was born out of the need to make our complex applications highly available, scalable, portable and deployable in small microservices independently. It also extends its capabilities to make adoption of DevOps processes and helps you set up modern Incident Response strategies to enhance the reliability of your applications.

Plesk 360 + Squadcast: Alert Routing Made Easy

Plesk is a popular web hosting platform that makes it easier for administrators to set up and manage websites. Its offering Plesk 360 empowers users to Monitor & Manage Servers more effectively. With its features like fully integrated site & server monitoring helps users keep track of performance and prevent downtime.

Tagging & Routing at Squadcast | Incident Management | Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. Auto-add relevant information like priority, severity or alert type to make incoming incidents context-rich. Route alerts to the right responder(s) based on the tags they carry

Escalation Policy I Round Robin & Advanced Escalations I Incident Assignment Strategies I Squadcast

An escalation policy is a collection of rules used to define how and when an incident should be escalated. In Squadcast an Incident escalation happens when a responder hands off the task/incident to another member, and this handoff is subject to specific rules. This video explains how to set up Escalation Policies, and Round Robin Incident Assignment Strategy in Squadcast.

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Teams using MS Teams can now integrate with Squadcast and easily Acknowledge, Resolve & Reassign incidents using MS Teams. You can configure Squadcast to send a notification to the configured MS Teams channel as soon as an incident is triggered.

Creating Routing Rules I Creating Incident Routing Flows I Alert Routing I Event Tags I Squadcast

Alert Routing allows you to configure Routing Rules to ensure that alerts are routed to the right responder with the help of event tags attached to them. This video explains how you can utilise Routing rules to create various incident routing flows.