Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Using a Status Page in your Incident response process

A status page is a communication tool that allows you to display the current working status of your various services - whether fully functional, partially degraded, severely affected, etc. The nomenclature of the service status can be defined by you. On the status page, you can also access & update the uptime and incident history data for all your internal facing or customer impacting components.

Lessons in Building Well-Formed Scrum and Kanban Teams

In the early days of Amazon, Jeff Bezos set a rule: teams shouldn’t be larger than what two pizzas can feed, no matter how large a company gets. Setting this rule of small teams meant individuals spent less time providing status updates to each other and more time actually getting stuff done. It also allowed team members more time to focus on continuous improvement. PagerDuty, like Amazon, has a strong culture of continuous improvement.

See Your PagerDuty Account Clearly in 2020

What better way to start off the new year than reflecting on the past 12 months and conducting a retrospective of your systems, processes, and culture at your organization? For instance, what did your overall incident response look like in 2019? Was it a smooth and streamlined process or did chaos reign during incident conference calls? But when burning sage and holding magic crystals don’t refresh your office vibes or your incident response process, PagerDuty University has got you covered.

The Role of Live Event Notifications in Your Incident Response Plan

According to a study from the University of Maryland, a hacking attack occurs every 39 seconds. During a quick coffee break, your systems could be attacked up to a dozen times. Depending on how your alerts are set up, you might miss a dozen or more notifications. Missed or delayed alerts, and the resulting slow responses, provide attackers with more time. Every minute provides attackers another opportunity to damage your systems or steal your data.

Making Observability Actionable at Scale - Sisir Koppaka | DBS DevConnect 2019

Many organisations already possess a vast amount of existing data about production systems. As customer expectations evolve, organisations are often challenged to find more proactive ways of dealing with traditionally reactive incident response activity. In this talk, we discuss approaches to unlock value from this data by making it truly actionable.