%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Lessons from Incident Management and Postmortems at Atlassian - Jim Severino

Oct 15, 2020 By Gremlin In Gremlin

How do you run incidents and postmortems at a company with thousands of engineers spread across the globe? Jim Severino shares what worked (and didn't worked) for Atlassian.

View Video

Gremlin

Read more about Lessons from Incident Management and Postmortems at Atlassian - Jim Severino

Incident Ready: How to Chaos Engineer Your Incident Response Process | FireHydrant

Oct 15, 2020 By FireHydrant In FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.

View Video

FireHydrant

Read more about Incident Ready: How to Chaos Engineer Your Incident Response Process | FireHydrant

Microsoft's 3 major incidents in 10 days, where did they go wrong?

Oct 15, 2020 By Noam Morginstin In Exigence

Just in case you haven’t heard, last week Microsoft experienced a huge outage that prevented users from accessing its Office 365 cloud-based subscription service which serves 200 million active monthly users. This latest outage was the third in ten days, causing the company to receive a deluge of customer complaints about a 'something went wrong' message that popped up when they tried to access their accounts.

Read Post

Exigence

Read more about Microsoft's 3 major incidents in 10 days, where did they go wrong?

October 2020 Update: Mute overwrite for iPhone (Critical Alerts), undo and more

Oct 14, 2020 By René In SIGNL4

Our October update brings the long-awaited mute-overwrite on iPhone (‘critical alerts’). We also introduce an undo action for Signl acknowledgements or closures. And in the web app you can now batch-ack and close to multiple Signls at once. All new features are introduced below – enjoy.

Read Post

SIGNL4

Read more about October 2020 Update: Mute overwrite for iPhone (Critical Alerts), undo and more

PagerDuty Summit: Lacework on the Shared Irresponsibility Model of Cloud Security

Oct 13, 2020 By PagerDuty In PagerDuty

Cloud security has become increasingly complex of late. Cloud providers use tens of thousands of APIs, container orchestration systems are growing in number and complexity, and more platforms and services are entering the cloud-native ring. What’s more, each of these components pose a potential security risk to organizations. And it’s you as the customer that’s responsible for the configuration and security of those components.

Read Post

PagerDuty

Read more about PagerDuty Summit: Lacework on the Shared Irresponsibility Model of Cloud Security

Chaos Engineering Processes

Oct 13, 2020 By FireHydrant In FireHydrant

You can use chaos engineering to test processes as much as you can test how systems fail.

View Video

FireHydrant

Read more about Chaos Engineering Processes

How SIGNL4 provides for a digital handover procedure

Oct 9, 2020 By Matt In SIGNL4

Handover procedures in operations and maintenance are a key element of business continuity. As work in this field is usually organized in shifts, it is essential to keep track of any critical incidents, machine breakdowns, job ownership, completion, issues that are still open or unresolved and other related items. Such knowledge has a significant impact on a timely or even proactive response, for instance if issues re-surface.

Read Post

SIGNL4

Read more about How SIGNL4 provides for a digital handover procedure

Enterprise Alert 2019 Update 8.5.1 released

Oct 9, 2020 By Derdack In Derdack

On October 7th we released a new Enterprise Alert version, version 8.5.1. Included in this release are the following enhancements.

Read Post

Derdack

Read more about Enterprise Alert 2019 Update 8.5.1 released

Streamline communication workflows with the Datadog Slack App

Oct 8, 2020 By Natalie Altman In Datadog

Sharing information about the health and performance of an application is a critical part of any team’s daily workflow. That’s why we’re excited to announce the Datadog Slack App, which simplifies crucial communication tasks by deepening the integration between Datadog and Slack.

Read Post

Datadog

Read more about Streamline communication workflows with the Datadog Slack App

How to: Automatically Archive Incident Slack Channels using conditions in FireHydrant Runbooks

Oct 8, 2020 By Rich Burroughs In FireHydrant

FireHydrant’s Slack integration is a great way to speed up your incident response, especially if FireHydrant Runbooks is automatically creating channels in your Slack workspace for each incident. “But what happens after the incident?” First of all, you shouldn’t have to manually archive those Slack channels; especially when you don’t want them clogging up the Slack navigation bar.

Read Post