Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

What are Blameless Retrospectives? How Do You Run Them?

In most engineering organizations, everyone agrees that in complex systems, failure is inevitable. It’s possible to prevent the recurrence of certain incidents, reduce their impact, or shorten the time to resolution. However, it’s impossible to avoid them altogether. In the past, we asserted failures are a result of people’s mistakes. It was all about “the bad apple theory,” focused on finding the “guilty party” and removing them to prevent future failures.

Incident Response Team | Roles & Responsibilities Defined

When your organization faces outages, errors, security breaches, and other incidents, you need to have a plan in place to take appropriate actions as needed. However, you also need a capable team of experts filling critical roles and responsibilities to execute those actions and effectively collaborate to resolve issues quickly. An incident response team, therefore should be developed in a way that avoids skills gaps in expertise.

Incident Management Automation - What You Should Know

Automated incident management is the process of automating incident response to ensure that critical events are detected and addressed in the most efficient and consistent manner. In incident management, time is of the essence and the primary benefit of automated incident management is speed. With automation, you can accomplish time-consuming tasks much quicker. This brings down the incident response time and allows the team to focus their attention on matters that require their expertise.

A better Grafana OnCall: Seamless workflows with the rest of Grafana Cloud

Incident response and management (IRM) doesn’t happen in a vacuum. Your ability to respond to issues in a timely manner depends greatly on how well your on-call engineers can use their IRM tooling and observability tools together to understand what changed and why.

PagerDuty Study Reveals Security Concerns Are Slowing Adoption of GenAI Among the World's Largest Companies

98% of top tech execs paused their corporate genAI initiatives to establish policies. Execs say that a trusted technology partner is key to incorporating genAI into their organizations.

Turn tickets into actionable alerts with ilert integration for HaloPSA and HaloITSM

At ilert, we are dedicated to providing an effortless, seamless connection between our incident management platform and other popular tools that empower teams to excel in operations. We're excited to introduce two new integrations from the Halo suite: HaloITSM and HaloPSA.

How to Keep Observability Alive in Microservice Landscapes through OpenTelemetry

The concept of observability has become a cornerstone for ensuring system reliability and efficiency in modern software engineering and operations. Observability, beyond its traditional scope of logging, monitoring, and tracing, can be intricately defined through the lens of incident response efficiency—specifically by examining the time it takes for teams to grasp the full context and background of a technical incident.

7 Key Takeaways from HIMSS 2024

The Healthcare Information and Management Systems Society (HIMSS) conference serves as a beacon for the healthcare industry, showcasing the latest innovations and trends that shape the future of healthcare. In 2024, HIMSS once again brought together industry leaders, innovators, and stakeholders to explore the transformative potential of technology in healthcare. In this blog, we will delve into the significant trends, challenges, and insights that have surfaced during our three days at HIMSS in Orlando.

Building trust through incident communication with Adrián Moreno, VP of Engineering at SumUp

Today, good incident communication isn't a nice to have—it's an absolute must. But where do you even start? To help answer that question, we sat down with the VP of Engineering at SumUp, ⁠Adrián Moreno Peña⁠, to get his perspective on how organizations of all sizes can share stellar comms no matter the situation. We discuss.

Finding the common ground with executives in incidents

I spotted this thread on Reddit, discussing the pains of executives dropping into incidents, and the corresponding impact it can have on the incident response process. Being an SRE community, it was a little more of a one-sided account of the situation. So let’s look a little closer, and dive into what it takes to make incidents better for responders and executives alike.