Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Share your failures, fix them faster with shareable activities

When you’re working with a Continuous Delivery workflow, you rely on building and deploying your websites in such a way that any improvements can be released into production any time. Identifying and fixing failures quickly is key to enabling rapid development cycles. But what happens when you’re looking into a failed build step, with no clue as to how to address it? You can now share links to specific lines within the activity logs.

What we learned from AWS's us-east-1 outage

In case you missed it, for several hours on December 7, 2021, AWS's us-east-1 region had an outage impacting multiple AWS APIs, taking out various websites across the internet. According to our own monitoring at OnlineOrNot, the outage started at 2021-12-07 15:32 UTC and began to recover well at 2021-12-07 22:48 UTC (with minor signs of life for a few minutes around 2021-12-07 20:08 UTC). Had we relied solely on AWS to update their status page before reacting, we would have been waiting a while.

What Value Does a Cloud Data Platform Hold For Your Business?

It has been roughly two decades since cloud computing first appeared on the scene, and yet, despite overwhelming evidence of the business operational productivity improvements, cost-savings, and competitive advantages it provides, a significant remnant of the banking industry remains open without using it.

Monitor the Azure Cosmos DB integrated cache with Datadog

Azure Cosmos DB is a fully managed NoSQL database that scales automatically with load and supports multiple APIs. This makes it easy to incorporate with your applications while removing the need to maintain your own database servers. The Cosmos DB integrated cache—which is now in public preview—is a new offering that can help reduce costs and improve performance for Azure Cosmos DB.

Incident Review - AWS Outages Crash Major Online Services - Including Amazon

The following is an analysis of the Amazon Web Services incident on 12/07/2021. Millions of users were affected by an Amazon Web Services outage that took down major online services such as Amazon, Amazon Prime, Amazon Alexa, Venmo, Disney+, Instacart, Roku, Kindle, and multiple online gaming sites. The outage, which originated in the US-EAST-1 region on Dec. 7, 2021, is still ongoing at the time of blog publication.

What is Cloud Repatriation and How to Avoid It with Cloud Cost Management

Cloud computing is one of the great technologies of our era. As such, enterprises everywhere are in a hurry to migrate to the cloud. However, one of the less-talked-about trends of our time is cloud repatriation: the process of enterprises reversing their decision, leaving the cloud, and returning to an on-prem setting. According to TechTarget, 85% of enterprises reported plans of repatriating their workloads from the public cloud in 2019.

AWS Outage on Dec. 7, 2021 - When Did You Know About It?

If something isn’t working as expected, your customers will want to know. How quickly did you know that AWS’s us-east-1 region was having issues? Was it from an article online? Customer requests flooding into your support queue? A tweet?? Not being able to get into a PUBG match? Or speaking of matches, were you unable to message your last Tinder connection?

Improving continuous verification: deploy fast and safely to production

Kubernetes and microservices have opened the door to smaller and more frequent releases, while DevOps CI/CD practices and tools have sped up software development and deployment processes. The dynamic nature of these cloud native architectures makes modern applications not just complex, but also difficult to monitor, find and fix problems.

Sponsored Post

Service Mocks: Scaling a SaaS Demo with Traffic Replay

Building, running and scaling SaaS demo systems that run around the clock is a big engineering challenge. Through the power of traffic replay, we scaled our demos in a huge way. A few weeks ago we launched a new demo sandbox. This is actually a second generation version of our existing demo system that I built a few months ago (codename: decoy). Because the traffic viewer page shows the most recent data by default, you need to constantly be pumping new data in there. Any type of real-time SaaS system is going to have a similar requirement. So this needs to be planned.