Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Incident Response with AWS Systems Manager

The typical DevOps on-call engineer is responding to alerts, triaging based on service impact, troubleshooting high priority incidents, and taking action to remediate issues. Automation tools like AWS Systems Manager can be a big help in reducing some of the more repetitive work and allowing engineers to focus on the most important tasks.

Can You Trust Machine Learning In IT Operations?

Chronically understaffed and constantly stressed-out IT Ops and NOC teams are overwhelmed by today’s IT noise. Artificial Intelligence (AI) and Machine Learning (ML) can help these teams because ML (and AI) are exceptionally good at processing enormous volumes of very complex data in real-time, or near real-time, and surfacing actionable insights.

Reduce IT downtime with incident management

In the IT world, if a server can fail or traffic can overload the network – it will. And the consequences of downtime are significant. Many IT organizations face database, hardware, and software downtime that last short periods or can shut down the business for days. According to Gartner, the average cost of network downtime alone is $5,600 per minute. What measures can organizations take to reduce IT downtime?

AWS: Operations Health and Best Practices

The ITOps world is a harsh working environment where ITOps personnel are expected to minimize the business impact of incidents at all hours of the day—regardless of the impact to themselves or their families. As more companies undergo digital transformation, the number of alerts and interruptions flowing to IT first responders will continue to increase.

PagerDuty Launches New AWS Integrations for CloudWatch, GuardDuty, CloudTrail, and Personal Health Dashboard

As you may expect from a company founded by former Amazon employees, PagerDuty has been helping AWS users automatically turn any signal into the right insight and action for years. Our Amazon CloudWatch integration enables teams to proactively mitigate customer-impacting issues, which in turn allows organizations to innovate and scale both their AWS and hybrid environments with confidence.

Uptime During the Holiday Shopping Season

In the United States, it’s almost that time of year again where we count our blessings and give thanks. For retail workers, it’s also that time of year where they prepare for the onslaught of eager shoppers who waited hours in line to run into stores to get their hands on doorbuster deals (sometimes knocking down the employees in the process).