Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Making Observability Actionable at Scale - Sisir Koppaka | DBS DevConnect 2019

Many organisations already possess a vast amount of existing data about production systems. As customer expectations evolve, organisations are often challenged to find more proactive ways of dealing with traditionally reactive incident response activity. In this talk, we discuss approaches to unlock value from this data by making it truly actionable.

Docker Commands Cheat Sheet

In this article I will highlight the 6 key docker commands I use on a daily basis while using Docker in the real world. By no means is this an extensive list of commands, I kept it short on purpose so you could use it as a quick reference guide. I’ve also omitted the topic of building images and the commands that are associated with that.

OnPage Celebrates Successful Year

OnPage welcomes the new year with open arms. Though the team is excited for the new decade, we’d like to look back at our organizational growth and success in 2019. The previous year consisted of several Gartner mentions and the release of innovative, new OnPage capabilities. This post discusses and provides detail into these notable accomplishments.

The Role of Live Event Notifications in Your Incident Response Plan

According to a study from the University of Maryland, a hacking attack occurs every 39 seconds. During a quick coffee break, your systems could be attacked up to a dozen times. Depending on how your alerts are set up, you might miss a dozen or more notifications. Missed or delayed alerts, and the resulting slow responses, provide attackers with more time. Every minute provides attackers another opportunity to damage your systems or steal your data.

Five IT Trends to Look Forward to in 2020

New Year’s Eve marks the transition into a new decade, beginning with personal resolutions and expectations for 2020. Much is the same in the IT industry, as support teams expect to adopt trending technologies to reduce their mean time to repair (MTTR) and improve incident resolution. This post will provide an in-depth look into five trends, discussing how growing technologies streamline IT workflows in the new year.

Incident Alert Routing - Getting woken up only by alerts that matter to you

Site reliability engineers have one of, if not the, toughest roles in any organization. While dealing with incidents is one part of the job, the other is to build reliable systems. Google’s SRE book sums this approach nicely. One of the most important challenges for an SRE when it comes to balancing work between firefighting and toil reduction is the issue of alert noise.

Squadcast's Year in Review, 2019

We’re heading into 2020 with a platform full of features and a heart full of happiness! It’s the end of a decade and this year has been nothing short of great for us! 2019 gave us an accelerated product growth and our team grew by 2x in size. We kick-started this year with a complete UI refresh and a whole bunch of new features. We also sponsored some of the major tech events and conducted our first ever community driven meetup!

Making on-call superheros

Building a world-class service is as much about maintaining software as it is about developing it. On-call engineers are typically responsible for ensuring the reliability and availability of your service i,e your reputation, and source of revenue. Robust on-call schedules ensure that the right people are ready-to-go during times of crisis. Organizations continue to depend on on-call schedules and incident response processes that are a source of stress/anxiety or panic to employees.

5 Best Practices on Nailing Postmortems

Reading about postmortem best practices can sometimes be quite different from seeing them in action. Postmortems are like snowflakes; no two will ever look the same. There isn’t a definitive template for success that will work in every situation, but there are some practices and procedures when writing postmortems that can help. Here are five practices that can boost the effectiveness of your postmortems, with examples of postmortems or procedures that demonstrate these methods.

Gartner Publishes New Report: Six Smart Steps to ITSM Tools

Information technology service management (ITSM) tools streamline and regulate how IT services are delivered. ITSM tools include help-desk (e.g., ConnectWise Manage and ServiceNow) and monitoring software, providing smart ticketing capabilities and live system statuses, respectively. Unfortunately, Gartner Research reports that organizations tend to overbuy ITSM tools beyond their needs. For instance, organizations purchase unnecessary capabilities and features when adopting new ITSM technology.