Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

VictorOps and Relay for Incident Response

VictorOps is an incident response tool whose mission is straightforward: “To make being on call suck less.” It enables teams to quickly detect and respond to problems like a service degredation or outage. VictorOps supports a wide range of external integrations to extend its capabilities by connecting different parts of your DevOps toolchain.

Incident Ready: How to Chaos Engineer Your Incident Response Process

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, shared how our customers leverage best practices to break, mitigate, resolve, and fireproof incident processes.

Incident Ready: How to Chaos Engineer Your Incident Response Process | FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.

Microsoft's 3 major incidents in 10 days, where did they go wrong?

Just in case you haven’t heard, last week Microsoft experienced a huge outage that prevented users from accessing its Office 365 cloud-based subscription service which serves 200 million active monthly users. This latest outage was the third in ten days, causing the company to receive a deluge of customer complaints about a 'something went wrong' message that popped up when they tried to access their accounts.

October 2020 Update: Mute overwrite for iPhone (Critical Alerts), undo and more

Our October update brings the long-awaited mute-overwrite on iPhone (‘critical alerts’). We also introduce an undo action for Signl acknowledgements or closures. And in the web app you can now batch-ack and close to multiple Signls at once. All new features are introduced below – enjoy.

PagerDuty Summit: Lacework on the Shared Irresponsibility Model of Cloud Security

Cloud security has become increasingly complex of late. Cloud providers use tens of thousands of APIs, container orchestration systems are growing in number and complexity, and more platforms and services are entering the cloud-native ring. What’s more, each of these components pose a potential security risk to organizations. And it’s you as the customer that’s responsible for the configuration and security of those components.

How SIGNL4 provides for a digital handover procedure

Handover procedures in operations and maintenance are a key element of business continuity. As work in this field is usually organized in shifts, it is essential to keep track of any critical incidents, machine breakdowns, job ownership, completion, issues that are still open or unresolved and other related items. Such knowledge has a significant impact on a timely or even proactive response, for instance if issues re-surface.