Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Create a service catalog that grows with you

When your incident response process is centered around a service catalog, responders are able to more quickly pinpoint the service or functionality that’s down, bring in the team or experts, and then get to solving the problem faster. Saving even a few minutes can have a big impact on decreasing the costs around incidents and outages, so having up-to-date service details at your fingertips can make all the difference.

Squadcast + HaloPSA Integration: Enabling Streamlined Incident Response & Alerting

HaloPSA is a modern and intuitive all-in-one professional services automation (PSA) solution, designed for service providers. HaloPSA’s cloud platform helps you manage your entire business, modernize customer experience and automate your service. If you use HaloPSA for PSA requirements, you can integrate it with Squadcast, an end-to-end Incident Response and Reliability Workflow platform, to route detailed alerts from HaloPSA to the right users in Squadcast.

Developer environments should be cattle, not pets

Cattle, not pets is a DevOps phrase referring to servers that are disposable and automatically replaced (cattle) as opposed to indispensable and manually managed (pets). Local development environments should be treated the same way, and your tooling should make that as easy as possible. Here, I’ll walk through an example from one of my first projects at incident.io, where I reset my local environment a few times to keep us moving quickly.

Admin Panel - General Settings - xMatters Support

You can define the details for a company using the General Settings page accessed via the Admin menu. Depending on your permission level, you may not be able to view the General Settings screen. In addition, the settings you see on this page depend on both your role permissions and the features available in your product plan.

Redundancy for IT resilience: The backup guide for a disaster-proof network

Around six years ago on a Wednesday morning, software professionals worldwide were startled by a tweet from GitLab stating that they had accidentally deleted their production data, causing their site to go offline. Unfortunately, at that point in time, the open-source code repository giant had no idea that it would take them another 36 hours to restore their systems only to learn that 5,000 projects and 700 new user accounts were affected while they were fixing the outage.

The Guide to SRE Principles

Site reliability engineering (SRE) is a discipline in which automated software systems are built to manage the development operations (DevOps) of a product or service. In other words, SRE automates the functions of an operations team via software systems. The main purpose of SRE is to encourage the deployment and proper maintenance of large-scale systems.

The 7 IT Automations for Highly Effective Organization: IT incident Remediation | Web App Down

No organization is immune to outages, unplanned interruptions, or quality reduction of normal service. But having a streamlined response plan can ensure these situations are dealt with more effectively to restore normalcy. In a world where IT efficiency is being measured by mean time to resolution, triaging and remediating alarms can directly impact the business in a positive way.

Komodor + Squadcast Integration: Simplifying Kubernetes Monitoring & Incident Response

Kubernetes (K8s) is a powerful tool for container orchestration, but it presents unique challenges when it comes to monitoring and incident response. Managing K8s requires 360º visibility into your environment, proactive health monitoring, along with right incident management, and suppression capabilities. In this article, we'll explore the benefits of integrating Squadcast with Komodor, two powerful tools that can help you overcome these challenges.

How metrics can make or break your IT operations strategy

IT people know that data is king, especially in optimizing IT operations. However, figuring out which metrics to collect and how to collect them can be challenging. IT teams have to factor in what IT directors, team managers, and the people overseeing operations want, what they’re concerned about, and what they consider important.