Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Managing Squadcast resources with our expanded Terraform provider

Hey folks! We’re excited to announce that we’ve vastly expanded the capabilities of our Terraform provider. Previously, our Terraform provider was limited to creating and managing services as a resource. We have now covered the entire spectrum of resources available on Squadcast right from creating and managing users, escalation policies and also managing SLO’s via our Terraform provider. What does that mean for you?

Blameless Expands Microsoft Partnership to Deliver Faster, More Intuitive Incident Response Collaboration

At Blameless, the world’s leading software engineering teams rely on us during incident management. A key part of our offering is the ability to seamlessly integrate with a customer’s unique tech stack. As such, we value partnerships with companies like Microsoft that enhance our user experience and meet the needs of our customers. We understand how essential it is to integrate with communication tools like Microsoft Teams, because it’s the first place a user goes to start an incident.

What is a Security Operation Center and how do SOC teams work?

With the growing complexity of IT environments, it is essential to have robust security processes that can safeguard IT environments from cyber threats. In this blog, we will explore how security operation centers (SOCs), help you monitor, identify and prevent cyber threats to safeguard your IT environments. This blog covers the following pointers.

What are the four Golden Signals?

When it comes to building reliable and scalable software, few organizations have as much authority and expertise as Google. Their Site Reliability Engineering Handbook, first published in 2016, details their practices to maintain reliability as Google scaled. But when you have over a million servers running thousands of services across more than twenty data centers, how do you monitor them in a consistent, logical, and relevant way?

Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

Nowadays, organizations address a high volume of incidents everyday. With so much happening, responders can be overwhelmed by the volume of incidents and may end up de-prioritizing certain important incidents. Hence, it is important to have an efficient on-call scheduling and escalation process in place. In this blog, we will explore how Round Robin Escalations can help distribute on-call load and set up efficient on-call schedules. This blog covers the following pointers.

The SRE's Quick Guide to Kubectl Logs

Logs are key to monitoring the performance of your applications. Kubernetes offers a command line tool for interacting with the control plane of a Kubernetes cluster called Kubectl. This tool allows debugging, monitoring, and, most importantly, logging capabilities. There are many great tools for SREs. However, Kubernetes supports Site Reliability Engineering principles through its capacity to standardize the definition, architecture, and orchestration of containerized applications.

SRE vs. DevOps: Differences and Similarities

Organizations scramble to adopt new frameworks and methodologies to make the software more scalable. Plus, they need to do it in a reliable way that doesn’t cause more problems. Enter Site Reliability Engineering (SRE), a set of practices introduced by a Google engineer. But how does it stack up to frameworks like DevOps? DevOps and SRE both enhance the software development and product release cycle.