Latest News

Uncovering the Importance of Mean Time Between Failures

Dec 10, 2021 By Christopher Gonzalez In OnPage

In the IT world, application service providers (ASPs) build customer trust by ensuring the continuous, uninterrupted availability of their services and software. Service availability allows customers to operate normally and generate revenue without being directly impacted by their providers’ system failures. Though providers work to ensure system uptime, they are often challenged by unexpected technical issues that impact customer-facing systems.

Read Post

OnPage

Read more about Uncovering the Importance of Mean Time Between Failures

Monthly Moo Update | December 2021

Dec 10, 2021 By Adam Frank In Moogsoft

What a year 2021 has been for us all. We are extremely proud of the continuous innovation and delivery of new features and functionality we have provided throughout the year, all while maintaining enterprise scale and uptime that could win awards. We’ve heard success story after success story from our brilliant customers, each unique in their own way. We couldn’t have had the successful year we’ve had without you, and it’s been our honor to be part of your success.

Read Post

Moogsoft

Read more about Monthly Moo Update | December 2021

BigPanda's ServiceNow integration just got better

Dec 9, 2021 By Bhushan Jadhav In BigPanda

ServiceNow is widely used across Fortune 1000 and Global 5000 enterprises, so it’s no wonder that the majority of BigPanda customers use ServiceNow and integrate with it to streamline their ticketing requests. BigPanda’s AIOps Event Correlation and Automation Platform provides context-rich incidents to IT Ops teams relying on ServiceNow and helps them gain end-to-end real-time visibility into their operations.

Read Post

BigPanda

Read more about BigPanda's ServiceNow integration just got better

What we learned from AWS's us-east-1 outage

Dec 8, 2021 By Max Rozen In OnlineOrNot

In case you missed it, for several hours on December 7, 2021, AWS's us-east-1 region had an outage impacting multiple AWS APIs, taking out various websites across the internet. According to our own monitoring at OnlineOrNot, the outage started at 2021-12-07 15:32 UTC and began to recover well at 2021-12-07 22:48 UTC (with minor signs of life for a few minutes around 2021-12-07 20:08 UTC). Had we relied solely on AWS to update their status page before reacting, we would have been waiting a while.

Read Post

OnlineOrNot

Read more about What we learned from AWS's us-east-1 outage

SRE Incident Management: Overview, Techniques, and Tools

Dec 8, 2021 By Jacob Hall In Dotcom-Monitor

In the world of a site reliability engineer (SRE), failure is not only an option, but also expected. Systems, web applications, servers, devices, etc., are all prone to performance issues and unexpected outages at some point. It is an unavoidable fact. These unexpected failures can lead to huge revenue losses, customer trust and depending on the industry, maybe fines. Fortunately, SRE incident management is one of the core practices used to limit the disruption caused by unexpected issues.

Read Post

Dotcom-Monitor

Read more about SRE Incident Management: Overview, Techniques, and Tools

Why automation is the incident response 'easy button' MSPs & IR firms have been waiting for

Dec 7, 2021 By Noam Morginstin In Exigence

The managed security services market is booming. Coming in at $22.8 billion in 2021, it is projected to nearly double in just five years and grow to $43.7 billion by 2026. Moreover, cloud-based managed security services are poised to be the major growth driver for the broader MSP market, coming in at $219.59 billion in 2021, and expected to reach $557.10 billion by 2028. As we can see, providing robust security services is a key competitive differentiator for the lucrative MSP market.

Read Post

Exigence

Read more about Why automation is the incident response 'easy button' MSPs & IR firms have been waiting for

Incident Review - AWS Outages Crash Major Online Services - Including Amazon

Dec 7, 2021 By Karthik Suresh, Carol Hildebrand In Catchpoint

The following is an analysis of the Amazon Web Services incident on 12/07/2021. Millions of users were affected by an Amazon Web Services outage that took down major online services such as Amazon, Amazon Prime, Amazon Alexa, Venmo, Disney+, Instacart, Roku, Kindle, and multiple online gaming sites. The outage, which originated in the US-EAST-1 region on Dec. 7, 2021, is still ongoing at the time of blog publication.

Read Post

Catchpoint

Read more about Incident Review - AWS Outages Crash Major Online Services - Including Amazon

Space Made Simple: How PagerDuty Enabled Loft Orbital to Achieve Incident Response Lift Off

Dec 7, 2021 By PagerDuty In PagerDuty

The next great space race is on. Today, there are multiple companies competing to earn their slice of a global space industry set to be worth more than $1 trillion by 2040. However, launching a satellite into space still isn’t an option for most organizations due to the prohibitive costs and complex engineering required.

Read Post

PagerDuty

Read more about Space Made Simple: How PagerDuty Enabled Loft Orbital to Achieve Incident Response Lift Off

The Cultural Shift to Modern IT Operations

Dec 6, 2021 By xMatters In xMatters

In the world of always-on services, many organizations have taken the path to modernize their IT operations to provide greater agility, lower cost, and most importantly, to deliver frictionless digital customer experiences. Is your DevOps team deploying more frequently than operations can support? Are you struggling to keep up with the maintenance issues associated with aging software? Modernizing your IT operations can be the key to overcoming these complexities.

Read Post

xMatters

Read more about The Cultural Shift to Modern IT Operations

What's New: Updates to Runbook Automation, Event Intelligence,Partner Integrations, and More!

Dec 6, 2021 By Vera Chan In PagerDuty

We’re excited to announce a new set of updates and enhancements to the PagerDuty platform. The product team has been hard at work making updates from Event Intelligence, Runbook Automation, and Applications with Monitoring Tools, to PagerDuty and PagerDuty Community Events.

Read Post

PagerDuty

Read more about What's New: Updates to Runbook Automation, Event Intelligence,Partner Integrations, and More!

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Uncovering the Importance of Mean Time Between Failures

Monthly Moo Update | December 2021

BigPanda's ServiceNow integration just got better

What we learned from AWS's us-east-1 outage

SRE Incident Management: Overview, Techniques, and Tools

Why automation is the incident response 'easy button' MSPs & IR firms have been waiting for

Incident Review - AWS Outages Crash Major Online Services - Including Amazon

Space Made Simple: How PagerDuty Enabled Loft Orbital to Achieve Incident Response Lift Off

The Cultural Shift to Modern IT Operations

What's New: Updates to Runbook Automation, Event Intelligence,Partner Integrations, and More!

Monthly Archive

Follow Us