Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

Nowadays, organizations address a high volume of incidents everyday. With so much happening, responders can be overwhelmed by the volume of incidents and may end up de-prioritizing certain important incidents. Hence, it is important to have an efficient on-call scheduling and escalation process in place. In this blog, we will explore how Round Robin Escalations can help distribute on-call load and set up efficient on-call schedules. This blog covers the following pointers.

Bridging the gap between Engineering and Customer Support during incidents

Customer trust and satisfaction are the most important currency your business can own. No matter how brilliant your product, without happy customers your business will struggle. When everything is running smoothly, it’s easy to feel that heady dose of customer love. It’s when things break during an incident that these relationships are really put to the test.

The Five Main Components of a Fully Developed EHR System

The adoption of electronic health record (EHR) systems has seen tremendous growth across geographies, especially in the US. According to American Hospital Association data shared by the Office of the National Coordinator for Health Information Technology, over 93% of American hospitals are enabled by some form of EHR in their organization. Implementing an EHR system in your clinic or hospital is a big decision.

Get started with Grafana OnCall and Terraform

Managing on-call schedules and escalation chains, especially across many teams, can get cumbersome and error prone. This can be especially difficult without as-code workflows. Here on the Grafana OnCall team, we’re focused on making Grafana OnCall as easy to use as possible. We want to make it easier to reduce errors with your on-call schedules, create schedule and escalation templates quickly, and fit on-call management into your existing as-code patterns.

Healthchecks + Squadcast Integration: Routing Alerts Made Easy

Healthchecks is a cron job monitoring service which listens to HTTP requests and email messages ("pings") from your cron jobs and scheduled tasks ("checks"). It lets you update your job to send an HTTP request to the ping URL every time the job runs. When your job does not ping Healthchecks.io on time, then you will receive an alert! If you use Healthchecks for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Healthchecks to the right users in Squadcast.

Introduction to Service Catalog | Service Ownership | Service Classification Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.
Sponsored Post

How To Reduce Incident Tickets

In IT environments, incidents happen all the time and it's impossible to prevent all of them. Regardless of the available software solutions or the level of technical training of both users and developers, no organization is immune to incidents. The increased dependence on IT infrastructure to provide core services means that any disruption in IT services can cause any organization significant financial and reputational harm. For example, IT service providers need to resolve customer support tickets following the service-level agreements (SLAs), and failing to do so makes them liable for breaching such agreements.

Sponsored Post

What are Runbooks? And why are they needed?

Imagine being an Ops engineer in a team just struck by tragedy. Alarms start ringing, and incident response is in full force. It may sound like the situation is in control. WRONG! There's panic everywhere. The on-call team is scrambling for the heavenly door to redemption. But, the only thing that doesn't stop - Stakeholder Inquiries. This situation is bad. But it could be worse. Now imagine being a less-experienced Ops engineer in a relatively small on-call team struck by tragedy. If you don't have sufficient guidance, let alone moral support- you're toast.

RESOLVE '22: Expert predictions for AIOps 2022-2025

BigPanda’s RESOLVE ‘22 conference hosted a number of luminaries in the AIOps and IT Ops world, so naturally we needed to get their thoughts on the future of the market and where they see AIOps going in the next few years. Our guests for the session titled Expert predictions for AIOps 2022-2025 were from the press, investor community, analyst community and vendor world.

Using StatusPage at squadcast | SRE Best practices | Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.