Operations | Monitoring | ITSM | DevOps | Cloud

Rootly

But It's Not Our Fault! When Third-party Incidents Affect Your Service

Very few SaaS products exist completely independently. Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. When these systems fail, it can leave you feeling pretty helpless. In some cases you might have fallback options, but oftentimes all you can do is wait for recovery and clean up the fallout.

Rootly Raises $12 Million from Renegade Partners, Google Gradient Ventures, & XYZ Ventures

We are excited to announce that we have raised a $12M round of financing led by Renegade Partners with participation from Google Gradient Ventures (Google’s AI-focused venture fund) and XYZ Ventures. This brings our total funding to date to $15.2M ($20M CAD) alongside our other existing investors Y Combinator and 8VC.

Kubernetes Incident Management Best Practices

Creating just any infrastructure on Kubernetes is not enough. There are so many basic configurations you could apply and create the infrastructure for your application for the time being and it might work just fine. The incident responses won’t always remain 100% reliable. You will run into newer potholes, and that’s okay.

The Medium is the Message: How to Master the Most Essential Incident Communication Channels

We’ve all seen it: a company experiencing a major incident and going radio silent, leaving their customers to wonder “Are they doing something about this?!”. If you’ve ever been on the inside of something like this, you know the answer is most likely yes, there are people working hard to put out the fire as quickly as possible. But when it comes to incidents, perception is reality for customers.

Improve Visibility and Capture More Data with Triage Incidents

As new incidents emerge, there are often many unknowns about the size, severity, and cause of the problem. Sometimes it’s not clear if the problem is an incident at all. That’s where introducing a triage stage to your incident management process can help. In this post, we’ll look at the benefits of adding a triage layer to your incident management, and how Rootly’s Triage feature allows you to seamlessly transition from triage to real incident (or false alarm).

Lessons from the CircleCI Security Incident

In some respects, security and reliability are competing priorities. Security controls may reduce reliability, and responding to security incidents may require mission-critical systems to be paused or shut down until they're secure. The recent security incident involving CircleCI, however, shows that it's not always necessary to choose between prioritizing security or reliability.

How Many SREs Does Your Company Need? Here's How to Decide

So you’ve decided to take advantage of Site Reliability Engineering by hiring SREs for your company. Now, you have a second decision to make: Exactly how many SREs to hire. Do you need just one or two SREs? Or should you build a sprawling SRE team, with a dozen or more SREs on hand to support your organization’s reliability needs? The answers to these questions will, of course, vary; every business’s needs are different.

Monitoring Your Platform From Multiple Locations

Mature start-ups and scale-ups create wonderful and challenging environments for Engineers. As the product they’re creating matures and the brand becomes a successful one, the user base generally starts growing, and, for some companies, in places they might not expect it to grow. As that happens, new challenges arise for Engineers. One of these challenges is pretty straightforward to guess. Basically having a particular product available throughout different regions of the world.