Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Tracking On-Call Health

If you have an on-call rotation, you want it to be a healthy one. But this is sort of hard to measure because it has very abstract qualities to it. For example, are you feeling burnt out? Does it feel like you’re supported properly? Is there a sense of impending doom? Do you think everything is under control? Is it clashing with your own private life? Do you feel adequately equipped to deal with the challenges you may be asked to meet? Is there enough room given to recover after incidents?

4 Best Practices for Root Cause Analysis

As failures are a common part of any system’s lifecycle - what would be the Root Cause Analysis for this type of problem? If you build and deploy a system, there are high chances that you'll have to deal with a failure in the near future. However, what matters is how you handle such failures. As an organization, you need to have pre-formulated strategies to handle failures as and when they occur.

List of Potential Incident Management Issues

Incident management is the process followed by the area of IT service management to respond to a service disruption, in order to restore it to normal as quickly as possible, minimizing the negative impact on the business. An incident is a single unplanned event that generates a service disruption, whereas a problem is a cause or potential cause of one or more incidents, as defined by ITIL incident management guidelines.
Sponsored Post

Major Incident Process Is at the Heart of Effectiveness

Read the new white paper on major incident management. Businesses need to be prepared for minor and major incidents to happen to their technologies, be it an integration disconnecting or an entire system being taken offline. Preparation ensure that not only can losses be minimized, but they can protect themselves and potentially their clients from risky impacts.

Making waves in IT Ops

It feels a bit surreal stepping into the Regional Vice President of Sales position here at BigPanda just a few months after the company achieved Unicorn status. In more than 15 years of managing enterprise software sales, this is the first time I knew I was going to play a critical role in facilitating a company’s ascension to the top of their sector. Even in college, I knew this is what I wanted.

How StatusCast makes managing incidents smarter in Slack

These days, more and more IT teams spend much of their workday in Slack. It’s essentially a second virtual home. For those employees who find Slack their main source of communication, it stands to reason that you need to access tools, bots, apps, and more – directly within the Slack environment. You shouldn’t have to leave your home to get your work done, and you shouldn’t have to leave Slack to communicate with and update your team and your clients.

Now Available on AWS Marketplace: PagerDuty Runbook Automation and PagerDuty Process Automation On Prem

We are excited to announce that PagerDuty® Runbook Automation and PagerDuty® Process Automation On Prem are now available on the AWS Marketplace, the leading global cloud provider. With more than 200 different cloud services, AWS makes it simple and attractive to build and grow your cloud-native business and/or migrate your existing infrastructure to the cloud, so you can begin to take advantage of the unlimited scale, agility, and flexibility the cloud offers.