%term

Monitoring that Monitors the Monitors of the Monitors

Oct 12, 2018 By PagerDuty In PagerDuty

One way to break the cycle of alert fatigue is by improving the quality of the signals you monitor. That can mean greater resolution at which monitoring data is ingested and processed, smarter statistical methods for aggregating and correlating data across multiple services, or routing alerts through an escalation and incident management system.

View Video

PagerDuty

Read more about Monitoring that Monitors the Monitors of the Monitors

This IS NOT Fine: Putting Out (Code) Fires

Oct 12, 2018 By PagerDuty In PagerDuty

So the dumpster is on fire. Again. The site’s down. Your boss’s face is an ever-deepening purple. And you begin debating whether you should join the #incident channel or call an ambulance to deal with his impending stroke. Firefighters have clear procedures and a strong hierarchy. The first truck at a scene immediately begins assessing the situation.

View Video

PagerDuty

Read more about This IS NOT Fine: Putting Out (Code) Fires

Another Journey of Chaos Engineering

Oct 11, 2018 By PagerDuty In PagerDuty

Chaos engineering is here to stay. There's a thriving community, numerous open source projects, a few books, even a startup. Companies are hiring chaos engineers and creating entire teams focused on chaos engineering. This talk is about strategies for launching a chaos engineering movement at your company, as well as the challenges and results you can expect.

View Video

PagerDuty

Read more about Another Journey of Chaos Engineering

Accelerating Incident Response

Oct 11, 2018 By PagerDuty In PagerDuty

Incidents are never fun, but a bad incident response process makes them even less so. How do technical teams mobilize the right people and provide the right context and tooling to rapidly take action and drive incident resolution? With the clock ticking and up to millions of dollars lost per minute of downtime, there’s no time to waste in assembling the right experts.

View Video

PagerDuty

Read more about Accelerating Incident Response

PagerDuty on PagerDuty

Oct 11, 2018 By PagerDuty In PagerDuty

Unless you spent the last few years on a remote deserted island, you might have noticed some changes in how work gets done at your company. Whether you call it Digital Transformation, DevOps, or Digi-DevOps-ification, it reaches far beyond just your Development and Operations teams.

View Video

PagerDuty

Read more about PagerDuty on PagerDuty

How StatusHub Complements and Extends Your Incident Management Process?

Oct 10, 2018 By StatusHub In StatusHub

Although the main focus of StatusHub is incident communication, it compliments each 5 activities of Incident Management: Identification, Categorization, Prioritization, Response and Communication with the user community through the life of the incident.

Read Post

StatusHub

Read more about How StatusHub Complements and Extends Your Incident Management Process?

Postmortems and Retrospectives (class SRE implements DevOps)

Oct 9, 2018 By Google Operations In Google Operations

Even after a service has been restored, SREs still have a bit of work to do. In this video, Liz and Seth discuss the postmortem process that SREs follow. Blameless postmortems and retrospectives are key to learning from failures and preventing recurrence. You will learn about the importance of conducting a postmortem, strategies for conducting a blameless postmortem, and techniques for trending retrospectives across your entire organization to gain better insights to prevent service disruptions in the future.

View Video

Google Operations

Read more about Postmortems and Retrospectives (class SRE implements DevOps)

Overrides, the Most Human Feature in PagerDuty

Oct 9, 2018 By Lisa Yang In PagerDuty

If you’ve ever been on call, you know that the incidents don’t stop because you have the flu. Or when you’re attending your child’s high school graduation. Or, as I found out firsthand, even when you’re at your own wedding. Confucius once said, “If you have never had a major occasion happen while you are on call, then you may not have ever lived.” (Okay, I totally made that one up.)

Read Post

PagerDuty

Read more about Overrides, the Most Human Feature in PagerDuty

It's Time to Start Talking about Digital Operations

Oct 9, 2018 By Isaac Sacolick In BigPanda

IT operations teams have some of the most stressful jobs in IT. Keeping data centers online, servers running, enterprise systems functioning, and applications performing — all while responding to incidents and requests is hard work. While there are monitoring systems in place to provide visibility and change management practices give IT some control over the network and environment, IT operations teams constantly feel like they are fighting a losing battle.

Read Post

BigPanda

Read more about It's Time to Start Talking about Digital Operations

AlertOps Announces Playbook Automation Focusing on Critical Enterprise Needs in Fast-growing Incident Response Market

Oct 9, 2018 By AlertOps In AlertOps

CHICAGO, Oct. 9, 2018 /PRNewswire/ — Illinois-based digital operations management and real-time collaboration platform AlertOps, announces a renewed focus on Enterprises in the IT Operations Management, DevOps, and SecOps spaces. CIOs and IT leaders need vendors that can merge technology and business scenarios to solve complex collaboration and communication problems.

Read Post

AlertOps

Read more about AlertOps Announces Playbook Automation Focusing on Critical Enterprise Needs in Fast-growing Incident Response Market

Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring that Monitors the Monitors of the Monitors

This IS NOT Fine: Putting Out (Code) Fires

Another Journey of Chaos Engineering

Accelerating Incident Response

PagerDuty on PagerDuty

How StatusHub Complements and Extends Your Incident Management Process?

Postmortems and Retrospectives (class SRE implements DevOps)

Overrides, the Most Human Feature in PagerDuty

It's Time to Start Talking about Digital Operations

AlertOps Announces Playbook Automation Focusing on Critical Enterprise Needs in Fast-growing Incident Response Market

Monthly Archive

Follow Us