%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Optimize Your Services to Save Time, Money, and Sleep - July 2019

Jul 25, 2019 By PagerDuty In PagerDuty

Interested in how to tune-up your services to derive even more benefit from your PagerDuty implementation? Easy and simple changes can have a huge impact on how much time and money you spend.

View Video

PagerDuty

Read more about Optimize Your Services to Save Time, Money, and Sleep - July 2019

Keep Your Business Stakeholders Updated While You Save the Day

Jul 25, 2019 By Adam Keller In PagerDuty

Imagine this: An airline encounters a major IT incident in a data center that affects their ticketing system. Behind the scenes, technical responders are scrambling to diagnose and fix the issue. However, because today’s systems are so complex, this issue is taking longer than expected to resolve, and hours have passed since the system went down. Meanwhile, passengers are stranded and taking their anger out on customer service agents and sharing their frustrations on social media.

Read Post

PagerDuty

Read more about Keep Your Business Stakeholders Updated While You Save the Day

Intent-based Capacity Planning and Autoscaling with Kubernetes

Jul 24, 2019 By Gigi Sayfan In Squadcast

Intent-based Capacity Planning is Google's approach to declare reliability intent for a service and then solve for the most efficient resource allocation plan dynamically. Learn how you can start using this approach to effectively manage the reliability of your services running on your Kubernetes cluster.

Read Post

Squadcast

Read more about Intent-based Capacity Planning and Autoscaling with Kubernetes

Reducing MTTR in the Field: 10 Simple Steps Using Retrace

Jul 24, 2019 By Ben Munat In Stackify

The last decade has ushered in a golden era of software engineering. The rise of cloud computing freed companies from managing their own data centers and provided on-demand scaling. These services allow for provisioning servers on the fly using configuration and code. Treating that task as just another type of software development led to the advent of DevOps.

Read Post

Stackify

Read more about Reducing MTTR in the Field: 10 Simple Steps Using Retrace

6 Best Practices For Outstanding Critical Incident Management

Jul 24, 2019 By Noam Morginstin In Exigence

"Businesses need to face the inevitability of being hacked at some point. It's not a question of if, but when — and that's why being proactive to minimize the risk is essential." Robert Egan. When a critical incident hits, what happens to an organization without an efficient incident management plan? Essentially, all stakeholders are left "fighting fires," trying to recover their systems, and get their business back up and running.

Read Post

Exigence

Read more about 6 Best Practices For Outstanding Critical Incident Management

What is BigPanda?

Jul 23, 2019 By BigPanda In BigPanda

BigPanda Autonomous Operations platform helps overwhelmed and understaffed IT Ops and NOC teams detect, investigate, and resolve IT incidents faster and more easily than ever before!

View Video

BigPanda

Read more about What is BigPanda?

Four Healthcare Workflows for Better Clinical Communications

Jul 23, 2019 By Christopher Gonzalez In OnPage

Healthcare organizations strive to enhance patient experience, ensuring that patients receive proper treatment at the right time, every time. However, due to antiquated communication tools, such as the pager, this goal is often difficult to achieve for some healthcare providers. Today’s healthcare facilities require an advanced pager replacement solution, integrating with intelligent scheduling systems and EMR solutions for better patient outcomes.

Read Post

OnPage

Read more about Four Healthcare Workflows for Better Clinical Communications

Calculating MTTR: An Evolution Driven by the Rise of DevOps

Jul 22, 2019 By Ben Munat In Stackify

The shift to cloud computing and the DevOps revolution have fueled some important changes in the way we think about software development and monitoring. It has delivered huge benefits to the companies that have fully embraced the approach. In fact, the DevOps Research and Assessment (DORA) 2018 industry survey found a new small group of “elite” performers that are deploying code far more often and having a far better mean time to resolution (MTTR) than the next closest group.

Read Post

Stackify

Read more about Calculating MTTR: An Evolution Driven by the Rise of DevOps

What Is MTTR? A Simple Definition That Will Help Your Team

Jul 19, 2019 By Ben Munat In Stackify

Mean time to resolution (MTTR) is defined as the sum of the total amount of time that service was interrupted divided by the number of individual incidents. The unit of measurement is some quantity of time. Ideally, you can use minutes as the unit. That is, unless you blacked out the eastern seaboard for weeks!

Read Post

Stackify

Read more about What Is MTTR? A Simple Definition That Will Help Your Team

HBO's "Chernobyl": Is there a lesson here for IT incident management?

Jul 18, 2019 By Haim Snir In BigPanda

I’m a big fan of historical TV dramas and last week I finished watching the stunning and shattering HBO TV miniseries about the 1986 Chernobyl disaster. As a monitoring expert and a product manager, I have visited dozens of IT operations centers, control rooms and NOCs, so I couldn’t help but compare them to the Chernobyl control room scenes in the show.

Read Post