Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Service-Based vs. Team-Based Approach: Which Is Better?

How is the incident response process set up at your organization? At PagerDuty, our approach is to holistically look at your infrastructure, your customer-facing applications, and your products. We distinguish these by describing these items as “services” that roll up to and make up a “business service.” This setup allows teams to better manage these services so that when incidents do happen, responders can gain context much faster. But how?

Tips for Modern NOCs - Easing the Pain of Ticket Creation

Manual ticket creation can often be a pain. It’s difficult enough handling the barrage of alerts coming in, let alone opening tickets and copy/pasting their details into these tickets. In this post – we discuss a simple way to ease this pain, and share a video on how to do it.

September 2019 Update: Improved assigning of categories

Our September update improves the assignment of categories to Signl alerts, hence the enrichment and routing of alerts to the right people. Until now, a ‘Services & Systems’ category was assigned to a Signl alert, if at least one of the entered keywords was found in the event content or text delivered to SIGNL4 by email or webhook. This basically represents a logical ‘OR’ operator for this keywords search.

Large Diamond Mining Organization Adopts OnPage

Diamond mining is recognized as a dangerous occupation, causing serious accidents for mineworkers across the globe. Often times, these incidents turn out to be fatal because the victim didn’t receive immediate care from first responders. However, significant strides are being made to minimize the impact of these accidents by large, international organizations.

Unplanned Work, Part 2: The Impact on the Enterprise

Today, technology problems can alter the trajectory of a business. Minutes of downtime or latency (slow is the new down) cost organizations dearly in lost revenue and can jeopardize customer relationships. However, there’s an even more important consequence of technology problems than top-line risk: reduced innovation as teams are forced into reactive fire drills that take time away from product development.

Announcing our AWS CloudTrail Integration

One of the most common reasons for system failures is changes to the underlying infrastructure. Amazon CloudTrail does a great job of recording when actions are taken but a lot of organizations don’t take advantage of it. FireHydrant now includes this data, giving you visibility into changes to your infrastructure while you’re investigating an incident.

Automating Critical Incident Management; Easier Than You Think

Organizations need to continually ramp up and improve their security and resilience to unexpected incidents. But as the number of endpoints, networks, and user interfaces grow exponentially, the task becomes more difficult, and manual incident response management becomes less and less effective.

Unplanned Work: The Impact on DevOps Teams

Going on call and being awakened at a moment’s notice to put out fires when reputation and revenue are on the line is incredibly stressful. And with DevOps teams under increasing pressure to simultaneously release new products faster while ensuring reliability and quality, burnout is a rapidly growing problem. It’s why #HugOps and empathy are becoming so central to the culture of DevOps.