Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Whose fault was it anyway? On blameless post-mortems

No one wants to be on the receiving end of the blame game—especially in the wake of a major incident. Sure, you know you were the one who made the final change that caused the incident. And hopefully, it was a small one that didn’t cause any SEV-1s. Still, the weight of knowing you caused something bad should be enough, right? Unfortunately, sometimes fingers get pointed, your name gets called, and suddenly, everyone knows that you’re the person who created more work for everyone.

What Is the Role of an Incident Commander?

For most businesses, managing major incidents can be intimidating. With a swarm of information coming from different directions, keeping things organized and maintaining clear, effective communication is tough. It only gets worse when there's no defined process to follow. This disorganization confuses everyone, delays responses, and increases the incident escalation rate. Enter the incident commander (IC).

Incident response and awareness acceleration: What we can learn from responders of Queenstown floods.

I was visiting Queenstown, New Zealand last week amidst the horrible floods which quickly escalated. As an incident responder myself, I was amazed at the operations and how fast responders on the ground acted in evacuating and clearing the grounds. Over 100 people were evacuated in the middle of the night with zero casualties. A commendable job. Here are some observations I made and what we can learn as incident responders ourselves..

A Journey through the Blameless Resource Library

From the very beginning of Blameless, we had two vital missions. First, to offer a solution to what we saw as a mounting crisis of reliability by offering a comprehensive, easy-to-use, reliability platform. Second, to educate the companies facing this crisis on the fundamentals of incident management, cutting-edge best practices, and the cultural values that sustain learning and growth.

The new principles of incident alerting: it's time to evolve

In the ever-evolving world of software engineering, the landscape is constantly shifting. New technologies emerge, best practices evolve, and how we build and run software continues to change. However, when it comes to incident alerting, it often feels like we're stuck in the past.

Generative AI for IT Operations: Your Questions Answered

IT leaders are thrilled about the potential of Generative AI for IT Operations. But they also want to know how it works, why it works, and what it will do for them before taking the leap and adopting this new technology. Allow me to share my perspective on the hype and the truth behind Generative AI. I’m the Field CTO for BigPanda, Operational Intelligence and Automation driven by AIOps.

Observability Pillars: Exploring Logs, Metrics and Traces

The ability to measure the internal states of a system by examining its outputs is called Observability. A system becomes 'observable' when it is possible to estimate the current state using only information from outputs, namely sensor data. You can use the data from Observability to identify and troubleshoot problems, optimize performance, and improve security. In the next few sections, we'll take a closer look at the three pillars of Observability: Metrics, Logs, and Traces.

Alternatives to SMS alerts

While SMS alerts are handy, they also tend to be tricky. Across 120+ countries, we continuously deal with compliances & regulations from Vendors, Government, and Phone carrier companies. Other alert channels similar to SMS are a lot less cumbersome with higher delivery rates. Let’s take a look at the available options to switch from SMS.