Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How our product team use Catalog

We recently introduced Catalog: the connected map of everything in your organization. In the process of building Catalog as a feature, we’ve also been building out the content of our own catalog. We'd flipped on the feature flag to give ourselves early access, and as we went along, we used this to test out the various features that Catalog powers.

Services are not special: Why Catalog is not just another service catalog

As you may have already seen, we’ve recently released a Catalog feature at incident.io. While designing and building it, we took an approach that’s a tangible departure from a traditional service catalog. Here’s how we’re different, and why.

Azure Incident Management with Escalation Policy

These days, businesses heavily rely on cloud services like Microsoft Azure to power their operations. While Azure provides robust infrastructure and services, occasional issues and incidents can still occur. Serverless360 provides enhanced capabilities to monitor and manage Azure incidents in a system. But to ensure seamless operations and timely resolution of problems, it is crucial to have a well-defined escalation policy in place for Azure Incident Management..

The Unplanned Show, Episode 3: LLMs and Incident Response

A software engineer, a data scientist, and a product manager walk into a generative AI project… Using technology that didn’t exist a year ago, they identify a customer pain point they might be able to solve, build on teammates’ experience with building AI features, and test how to feed inputs and constrain outputs into something useful. Hear the full conversation here.

incident.io Catalog hands on lab

The incident.io Catalog is a connected, navigable, map of "things" that exist in your organization. We can use it to describe an organization as a connected graph, and use that graph to drive powerful workflow automations during incidents. In this hands-on training session, we'll work through an example of building a catalog for a mock organization. We'll then use the catalog to solve some real business problems, including automated incident data attribution, and some realistic workflows which outline how it works and what it enables in the context of incident management.

How AIOps Revolutionizes Observability for TechOps Teams

Managing over 1000 services and applications is daunting for any organization’s IT and Tech operations team. With a diverse mix of on-premises legacy systems and modern cloud stacks, the sheer volume of activity can overwhelm even the most skilled ITOps teams. The task is made more difficult by the fact that observability is fragmented. On average, organizations depend on 21 systems that produce metrics, logs, traces, and alerts for various services.

Sponsored Post

Squadcast's Improved Mobile App for Better Incident Response

The 2020 pandemic has definitely changed the way teams operate across the globe. Many of you may have already experienced moving from 100% office work to 100% remote work, and now that it has been almost three years since the pandemic started many of us have resorted to hybrid models. We at Squadcast value the importance of efficient communication, reaching the right people during a crisis, and the freedom to resolve critical incidents from anywhere, anytime. Keeping that in mind, we have made major improvements to our mobile app to help you effectively partake in Incident Response activities anytime from across the globe.

Fast Track Video Series: See a demonstration of BigPanda's Incident Intelligence and Automation Platform

BigPanda transforms millions of events into a small number of actionable alerts, no matter where they originate. How? Watch this video to learn more. The video shows how BigPanda allows you to normalize tag values across all tools, aiding event enrichment and correlation. The open integration manager then makes it easy to pre-process the event data helping to filter unwanted events from the feed. The filtering strips out duplicate and low-relevancy events and keeps them from cluttering up the console.

Cyberattack Prevention with AI

Cyberattack prevention involves proactive steps organizations take to protect their digital assets, networks, and systems from potential cyber threats. Preventive measures, such as a combination of best practices, policies, and technologies, are employed to identify and mitigate security breaches before they can cause significant damage.

There Are No Repeat Incidents

People seem to struggle with the idea that there are no repeat incidents. It is very easy and natural to see two distinct outages, with nearly identical failure modes, impacting the same components, and with no significant action items as repeat incidents. However, when we look at the responses and their variations, we can find key distinctions that shows the incidents as related, but not identical.