Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

Blameless recently had the privilege of hosting SRE leaders Garima Bajpai, Founder at Community of Practice - DevOps Canada and Jason Fraser, Delivery Lead at VMware Tanzu to discuss the value of crisis during incident response, the best and worst tech transformations they’ve seen, how reliability impacts the flow of value, and more.

Concrete Steps to Reducing MTTR

In today’s data-centric world, metrics or numbers define all performance benchmarks. The time between when an event starts and ends shows how well a system can handle and process such events. One of such metrics is MTTR. MTTR usually stands for Mean Time To Resolution, but it has held several meanings over the years. MTTR is a metric used to measure how well a system can bounce back from errors and provide long-lasting solutions.

Monthly Moo Update | April 2021

I don’t know about you, but April traveled at the speed of light. A blink and it happened. Our teams have been working at the same speed throughout one of our favorite months of the year. With an incredible amount of updates, we’ve made our product even more transparent and easier to use. It’s not just our world-class documentation that enables you, it’s also the in-product visualizations and enablement that help guide you without you even realizing it.

Top SRE Toolchain Used By Site Reliability Engineers

We have compiled a list of the most popular and sought out tools (some you may have heard of) that SREs need in their toolkit - at every phase of a production system to keep up with SRE best practices Site reliability engineering (SRE) practices help organizations by ensuring smooth functioning of their deliverables with utmost reliability and resilience. These can be achieved by a set of well-defined tools that are deployed at every phase of the production system to keep up with SRE best practices.

OnPage Recognized in Gartner's Latest Report on CC&C Systems

Gartner’s latest “Quick Answer” report discusses how clinical communication and collaboration (CC&C) systems can enhance pandemic-related provider and patient engagement. Modern healthcare delivery organizations (HDO) invest in CC&C solutions to simplify communication among care teams consisting of physicians, nurses and critical support personnel. The OnPage team is pleased to be recognized as a vendor in Gartner’s latest CC&C publication.

Failover Conf 2021 Wrap-Up

That’s a wrap! Gremlin hosted Failover Conf 2: Fail Smarter on April 27, 2021. In attendance were over 500 SREs, developers, sales engineers, product managers, DevOps experts, C-level execs, and other reliability pros from around the globe! This year’s conference included discussions around the future of DevOps, strategies for building reliable teams, analyzing human error to create better systems, and more.

Ivanti Gives Voice to IT Incident Management Software

A protracted, exasperating customer service experience popped into my mind while reading this sentence in the Ivanti Voice data sheet: “One of the most frequent customer complaints about call centers is having to repeat information.” Ain’t that the truth. Here’s a brief personal experience.