Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Reduce IT downtime with incident management

In the IT world, if a server can fail or traffic can overload the network – it will. And the consequences of downtime are significant. Many IT organizations face database, hardware, and software downtime that last short periods or can shut down the business for days. According to Gartner, the average cost of network downtime alone is $5,600 per minute. What measures can organizations take to reduce IT downtime?

AWS: Operations Health and Best Practices

The ITOps world is a harsh working environment where ITOps personnel are expected to minimize the business impact of incidents at all hours of the day—regardless of the impact to themselves or their families. As more companies undergo digital transformation, the number of alerts and interruptions flowing to IT first responders will continue to increase.

PagerDuty Launches New AWS Integrations for CloudWatch, GuardDuty, CloudTrail, and Personal Health Dashboard

As you may expect from a company founded by former Amazon employees, PagerDuty has been helping AWS users automatically turn any signal into the right insight and action for years. Our Amazon CloudWatch integration enables teams to proactively mitigate customer-impacting issues, which in turn allows organizations to innovate and scale both their AWS and hybrid environments with confidence.

Uptime During the Holiday Shopping Season

In the United States, it’s almost that time of year again where we count our blessings and give thanks. For retail workers, it’s also that time of year where they prepare for the onslaught of eager shoppers who waited hours in line to run into stores to get their hands on doorbuster deals (sometimes knocking down the employees in the process).

Meet Opsgenie at AWS re:Invent 2018-Making Incident Response Faster and More Efficient

It’s an exciting time here at Opsgenie. We recently joined the Atlassian family, updated our logo, released new pricing, and now we’re headed to AWS re:Invent 2018! So much has changed since last year’s event and we can’t wait to talk about it in person.

What is MTTR? Critical Incident Recovery Metrics to Reduce Downtime

Whether it’s a scheduled maintenance or an unexpected outage, downtime is time your solutions are out of action and unavailable for use. Long or frequent periods of downtime have significant costs to the company, and ultimately undermine customer trust. So what is MTTR? And how can improving MTTR reduce downtime? Below are four key metrics to get you started.