Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Communicating to Users During Incidents

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

Improving your team's on-call experience

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Getting over on-call anxiety

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

Communicating to Users During Incidents

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

What we learned from AWS's us-east-1 outage

In case you missed it, for several hours on December 7, 2021, AWS's us-east-1 region had an outage impacting multiple AWS APIs, taking out various websites across the internet. According to our own monitoring at OnlineOrNot, the outage started at 2021-12-07 15:32 UTC and began to recover well at 2021-12-07 22:48 UTC (with minor signs of life for a few minutes around 2021-12-07 20:08 UTC). Had we relied solely on AWS to update their status page before reacting, we would have been waiting a while.

Dealing with Noisy Error Monitoring

Say you've been tasked with monitoring an application, so you set up some alerts to let you know when errors are coming in. The minutes roll by, the errors start coming... ...and they don't stop coming... Oh my, there seems to be quite a few errors coming through. Alerting on each error isn't going to help, better report on changes in the error rate instead right? Not quite. While there's no shortage of vendors that'll sell you on the benefits of error rate alerting, you need to get back to basics first.

8 Months of OnlineOrNot: From 7 Day MVP to Stable Product

September and October were relatively quiet, so I thought I would write a single article for both months. While I'd normally try to write at least one useful article per month for OnlineOrNot's audience (as well as an update on how the business is going), I wrote no articles, and no code, actually. Instead, I packed up my life in Sydney, Australia, escaped lockdown, and relocated to France with my wife, and just enjoyed living for a while.

Six months in: How the SaaS that was built in 7 days is going

A few weeks before I sat down to write this article, I reshared my two month review of OnlineOrNot around the internet. Surprisingly, the article was quite popular: So I thought I'd clear up some confusion for the folks who only just read my two month review: I started OnlineOrNot on February 25, 2021, shipped the first version for people to use on March 2, 2021, and here I am in August writing the six month review.

Improving your team's on-call experience

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Getting over on-call anxiety

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.