Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Video AMA: Ana Medina

Ana is currently working as a Chaos Engineer at Gremlin 10, helping companies avoid outages by running proactive chaos engineering experiments. She last worked at Uber where she was an engineer on the SRE and Infrastructure teams specifically focusing on chaos engineering and cloud computing. Catch her tweeting at @Ana_M_Medina 11 mostly about traveling, diversity in tech, and mental health.

Improving MSP Incident Alert Management

Improving MSP Incident Alert ManagementAs the big game approaches this Sunday, I’ve been thinking about the NFL’s introduction of instant replay and how it makes the league much more enjoyable! Whether you’re rooting for the Patriots led by Tom Brady … or the Rams, you can’t deny that instant replay makes every Super Bowl much more efficient and adds more clarity to the game.

January 2019 Product Update: New Integrations & APIs

To kick off the year, we’re launching a monthly blog series to share new product announcements on an ongoing basis. This month, we’re excited to announce several new integrations, as well as the new global events rule API that empowers admins and developers to easily manage event rules at scale. (Be sure to also check out our platform release notes to stay up-to-date on what’s new.)

Say Hello to Enhanced Role Based Access Control (RBAC) from BigPanda

Role-based access control (RBAC) has become one of the main methods for system access control within large enterprises, assigning access to users based on their role in the organization. Employees are allowed to access only those resources that are necessary to effectively perform their assigned job duties.

Dashbird announces incident management platform

Since the beginning of Dashbird, we’ve been conducting user interviews with all the users that take the time to jump on a call with us. One of the most common requests we get is the ability to customise alerts - specifically, what failures you will get notified upon and the ability to set custom alert based on metrics. Today we announce a new part of Dashbird that takes care of that - an incident management platform.

The Cost of Operational Immaturity

Digital operational maturity is defined as an organization’s effectiveness at real-time work and ability to focus on performance metrics that improve as the organization becomes more adept at responding to incidents. Based on extensive research and nine years of industry data, in conjunction with a survey of 600+ respondents from across industries, PagerDuty developed a model that identified the four following levels of operational maturity.