The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
What makes an engineering team? Communication, collaboration, process, order, and common goals. Otherwise, they would just be a bunch of engineers. The same is true of their tools. Connectivity and process turn a bunch of tools into a DevOps toolchain. If you need a DevOp toolchain, you can use it to easily build an incident response process.
The role of an engineer at a startup is a tangled web: as well as writing code, you have to be your own product manager, QA tester, customer support and designer. But there’s another hat that you have to wear which you might not have thought about: copywriter. All products have copy, from welcome messages to text on a submit button. At incident.io, we have to put on our copywriting hats every time we add a new feature.
We have released an update for Enterprise Alert 9 (version 9.3) that revolutionizes our OPC connector and also includes some bug fixes. Read all the details in this article.
From alerting to during to post incident, great communication is the key to effective incident response.
When downtime strikes any distributed software deployment or platform, it's all hands on deck until the lights are green and service is restored. This process, from the recognition of a problem to a deployed solution, has most commonly been defined as MTTR - mean time to resolution. In just the last few years, DevOps and site reliability (SRE) professionals have developed sophisticated new models for how they work and audit their successes. In 2022, MTTR is one of the most widely-used software performance success metrics.
If you’re working at an early stage startup and looking to get some good incident management foundations in place without investing excessive time and effort, this guide is quite literally for you. There’s an enormous amount of content available for organisations looking to import ‘gold standard’ incident management best practices – things like the PagerDuty Response site, the Atlassian incident management best practices, and the Google SRE book.
Last year, we released PagerDuty Rundeck Actions, a PagerDuty add-on product that connects responders to automated diagnostics and remediation for common problems directly in the PagerDuty incident response workflow. After working with our customers and listening to the community, we are excited to announce that PagerDuty Rundeck Actions now integrates with PagerDuty’s Slack integration.