The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Atri Mandal, HEAL’s AI/ML expert, has written a second blog about the 4P strategy, this time primarily focusing on solution recommendation which gives useful suggestions to the SREs on how to fix problems pro-actively.
We’re excited to announce a new set of updates and enhancements to PagerDuty’s Digital Operations Platform. Recent updates from the product team include On-Call Management and Incident Response, Process Automation, to PagerDuty Community & Advocacy Events. New capabilities enable users and customers to resolve incidents faster, do the following, and more.
Business continuity is a crucial part of any scalable operations plan, but many businesses fail to realize how important it is until their first critical emergency. Only then does business continuity management come to the forefront of planning exercises, and stakeholders are forced to reflect on what went wrong, why it went wrong, and determine if they can avoid it happening again, or be better prepared if it does. The true business continuity management lifecycle begins long before an incident.
Site Reliability Engineering (SRE) teams and Platform Engineering teams share similar goals -- like maximizing automation and reducing toil -- and similar methodologies. But they have different priorities, and use somewhat different tools to achieve them. What are SREs, what are platform engineers and how is each role similar and different? This article explains.
Sometimes heritage is better than new. Some people favor Coca-Cola Classic over New Coke, and heirloom tomatoes over regular tomatoes. Some Luddites might say the same thing about cloud computing. “I won’t put my (app/data) in the cloud! It will be more (secure | reliable | cheaper) if I run it myself in my own data center.”
When building an incident response process, it’s easy to get overwhelmed by all the moving parts. Less is more: focus first on building solid foundations that you can develop over time. Here are three things we think form a key part of a strong process. I’d recommend taking these one at a time, introducing incident response throughout your organisation. Just being honest: we’re a startup selling incident management software.