Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Battling database performance

Earlier this year, we experienced intermittent timeouts in our application while interacting with our database over a period of two weeks. Despite our best efforts, we couldn’t immediately identify a clear cause; there were no code changes that significantly altered our database usage, no sudden changes in traffic, and nothing alarming in our logs, traces, or dashboards. During that two-week period, we deployed 24 different performance and observability-focused changes to address the problem.

How we built it: incident.io Status Pages

We kicked off 2023 with a new team and a new product to build - Status Pages. We wanted to build a solution we could ship to customers as quickly as possible, while making sure that it’s reliable, fast and beautiful. Here’s how that process played out over the course of three months.

Announcing incident.io Status Pages - powering clear external comms to build trust

Clear and frequent communication carries considerable weight in today's era of hyper-competition among businesses—especially during incidents. Because of this, status pages have become the go-to choice for companies looking to prioritize trust, transparency, and clarity with their customers, even during downtime. Unfortunately, current status page solutions have made these communications particularly frustrating and stressful.

Our A, B, Cs of external communications

Communication carries more weight than ever before. Businesses are so much more connected to their customers given the number of mediums they can communicate through; Twitter, Instagram, Facebook, and even TikTok. Because of this, it's essential to prioritize these lines of communication throughout your day-to-day. Some might even say that over-communicating is the best way forward. Why? No one likes a company that appears simply like a black box with zero insight into what's happening.

Building a culture of incident response

At Vanta, our goal is to nurture a positive security culture in everything we do—which is especially critical given that helping our customers improve their security and compliance posture starts with our own. Employees are the key to our security resilience, so we strive to build and support a strong culture of incident response in tandem. Here’s what that means to us at Vanta.

Developer environments should be cattle, not pets

Cattle, not pets is a DevOps phrase referring to servers that are disposable and automatically replaced (cattle) as opposed to indispensable and manually managed (pets). Local development environments should be treated the same way, and your tooling should make that as easy as possible. Here, I’ll walk through an example from one of my first projects at incident.io, where I reset my local environment a few times to keep us moving quickly.

What are you learning from your incidents?

Think about this—what was the last incident that challenged you? Did you learn anything from it? It will be shocking to no one to hear that we deal with our fair share of incidents. These run the gamut from tiny bugs to significant outages (thankfully, the latter happening only very rarely 😮‍💨). Either way, we always take the time to learn from them in some way. This might look like changes to our response processes or revisiting systems we’re using.

Embracing the active user paradox

Question—when was the last time you purchased a new product and sat down to read the manual end-to-end before getting started? Ask this question to a room of 10 people and you’d likely get one or two hand raises, even though reading first could save you time and preempt many of the questions you’re likely to ask. Herein lies the problem when it comes to creating a SaaS product.

Taking the fear out of migrations

Over the last 18 months at incident.io, we’ve done a lot of migrations. Often, a new feature requires a change to our existing data model. For us to be successful, it’s important that we can seamlessly transition from the old world to the new as quickly as we can. There are few things in software where I’d advocate a ‘one true way,’ but the closest I come is probably migrations. There’s a playbook that we follow to give us the best odds of a smooth switchover.