Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Establishing Zero Trust out of the box at Enterprise scale

At most enterprises CIOs are already multiple waves into enforcing Zero Trust policy across their processes, configurations and teams. As a DevOps Lead, being responsible for juggling user empowerment and adherence to your executive’s policy across many SaaS tools can be tricky. This problem is especially challenging in incident management where highly sensitive data is being shared, incidents rely on multiple different types of team members, and response teams fluctuate from incident to incident.

The fastest and most robust path to incident declaration from monitoring tools

Here’s a crazy question: why do we still require a human to manually declare an incident for the things that we know are incidents? If we have enough confidence to build SLOs and high-severity alert routes for these specific scenarios, why are we still asking a human to confirm it’s an incident and get the assembly process in motion? Isn’t that just another button to push when we could be problem solving instead?

Insights into Observability Tools: Commercial vs. Open-Source

Observability has become a critical aspect of modern software development and operations, allowing organizations to gain insights into the health and performance of their applications and systems. One of the key decisions when implementing observability is choosing between commercial or open-source tools. We spoke to several professionals who shared their experiences and insights on this topic, shedding light on the pros and cons of each approach.

Process Automation v4.12.0 and v4.13.0 Release Notes

Product Managers Jake Cohen and Forrest Evans are back to update us on what’s new in the 4.12.0 and 4.13.0 releases of PagerDuty Process Automation. New in these releases are features to support #Kubernetes automation, managing resources in multiple #AWS accounts, and a new plugin suite for Sensu.

Major Incident Management with Zenduty, Grafana, Slack and Zendesk

In the current fast-paced world, businesses are seeking methods to increase their efficiency and simplify their processes. But, there are times when teams are unaware of an issue at the initial stage, leading to a bad customer experience. For example, you are a part of the Infrastructure team, where your primary responsibility is to check resources and notify when they reach their maximum capacity. Let's say due to an anomalous traffic load, our resource CPU utilization goes above 90%.

7 Types of Incident Response Tools

Incident response tools are software applications or platforms designed to assist security teams in identifying, managing, and resolving cybersecurity incidents. Incident response is a crucial part of an organization’s cybersecurity strategy, making it possible to detect threats, analyze vulnerabilities, respond to attacks, and recover from security breaches. Incident response tools are vital for safeguarding organizations against evolving cyber threats.

Welcome To xMatters - Ep 2 - Organizing Your Teams

Even the most gifted and powerful people could do with a helping hand now and again. Thankfully, they are not alone in the multiverse! xMatters has made the process of organizing your teams and creating a customized on-call schedule as if by magic. This way, when help is urgently needed, the appropriate on-call individual will quickly join the team to save the day. To learn more about organizing your teams with xMatters, check out our tutorial videos on how to get started.