Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Best Practices in Incident Management

In an always-on world, companies look to systems and processes to keep their services up and running at all times. The most important part of maintaining this uptime is having an Incident Management process in place to restore your services in the event of an interruption or unplanned downtime. Incident Management processes are typically used by SRE, DevOps, NOC and other IT teams to respond to incidents that affect services and work on restoring their uptime.

The Face of Success: Insights from BigPanda's "IT Ops from Home" Virtual Summit

Close to IT 400 professionals from some of the most prominent enterprises in the retail, financial, technology, pharma and manufacturing industries attended our “Face of IT Ops from Home” virtual conference, enjoying a keynote session featuring Sony Playstation and State Farm Insurance, and three breakout sessions with Ulta Beauty, AWS and BlackRock 3.

Assessing the Economic Value of AIOps

Taking economics into account Most enterprises consider economics when deciding which AIOps platform to purchase. Often, their conception of economics is narrow, reduced to the resolution of three issues: 1) the cost of the technology; 2) its ability to replace human labor; and 3) its ability to displace deployed products and, hence, defray future maintenance and subscription charges. In other words, AIOps economics becomes almost entirely a matter of cost.

Alerts to Incident Response in Three Easy Steps

You may already be using Splunk to ingest data and generate alerts and dashboards so you can take quick action on problems, but did you know you can quickly start a VictorOps trial and in three easy steps, have great Splunk alerts escalated to the right teams and people with a mobile app notification, SMS message or a live phone call?

Elastic Observability in SRE and Incident Response

Software services are at the heart of modern business in the digital age. Just look at the apps on your smartphone. Shopping, banking, streaming, gaming, reading, messaging, ridesharing, scheduling, searching — you name it. Society runs on software services. The industry has exploded to meet demands, and people have many choices on where to spend their money and attention. Businesses must compete to attract and retain customers who can switch services with the swipe of a thumb.

Modern ITSM Solutions: Creativity in Incident Response (Bring Your Own Tools)

The IT landscape is constantly evolving. A tool that is heavily used this month, may be virtually obsolete the next. In a such a dynamic ecosystem, the methods used to implement these tools are unique to every organization. Therefore, it has become crucial for organizations to implement an incident response process that incorporates any combination of tools, even those that are highly siloed and departmental.

Managing Burnout During COVID-19

During this crisis, we’re all trying our best to keep ourselves and others healthy, manage chaotic homes, and prioritize our mental health. However, this can be difficult even when we’re not experiencing a pandemic. With the added stress, burnout is occurring at an alarming rate with people unable to separate home from work, the increased burden of keeping everything on and heightened on-call loads, and the strain on communication.

PagerDuty and IBM Watson AIOps Team Up to Automate Real-Time Responses

“The way we work has changed forever.” Those are words that our CEO Jennifer Tejada used in her interview with Yahoo Finance a couple of weeks ago. Those words made me stop and think about how much of our customers’ daily work has changed irreversibly. Working from home has changed from a luxury to a necessity, so how do folks in the IT world adapt to this change?