Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Slash MTTR, avoid costly downtime with improved cross-team Collaboration

Every second counts when IT teams are called upon to resolve business impacting issues. In modern enterprises, poor communication, fragmented toolchains and spiralling IT complexity can conspire to slow down incident response, putting service availability and ultimately customer satisfaction in peril.

Use your words: the importance of clear writing in product development

The role of an engineer at a startup is a tangled web: as well as writing code, you have to be your own product manager, QA tester, customer support and designer. But there’s another hat that you have to wear which you might not have thought about: copywriter. All products have copy, from welcome messages to text on a submit button. At incident.io, we have to put on our copywriting hats every time we add a new feature.

Sponsored Post

What is MTTR? Resolve incidents faster through ops, alerting and documentation

When downtime strikes any distributed software deployment or platform, it's all hands on deck until the lights are green and service is restored. This process, from the recognition of a problem to a deployed solution, has most commonly been defined as MTTR - mean time to resolution. In just the last few years, DevOps and site reliability (SRE) professionals have developed sophisticated new models for how they work and audit their successes. In 2022, MTTR is one of the most widely-used software performance success metrics.

The startup guide to sensible incident management

If you’re working at an early stage startup and looking to get some good incident management foundations in place without investing excessive time and effort, this guide is quite literally for you. There’s an enormous amount of content available for organisations looking to import ‘gold standard’ incident management best practices – things like the PagerDuty Response site, the Atlassian incident management best practices, and the Google SRE book.

Now You can Invoke PagerDuty Rundeck Actions Within the PagerDuty Slack Integration

Last year, we released PagerDuty Rundeck Actions, a PagerDuty add-on product that connects responders to automated diagnostics and remediation for common problems directly in the PagerDuty incident response workflow. After working with our customers and listening to the community, we are excited to announce that PagerDuty Rundeck Actions now integrates with PagerDuty’s Slack integration.

Announcing Grafana Incident, smart incident management for your teams

A huge challenge when dealing with incidents is the coordination and communication needed to put things right. What’s happened so far? Who has tried what query? Did we remember to keep stakeholders informed? What is the severity of the incident? Does this affect customers? Figuring this out requires a lot of back and forth as new team members join the incident.