Operations | Monitoring | ITSM | DevOps | Cloud

Facebook, Instagram, and Whatsapp's Outage - Understanding MTTR

Yesterday the most used social media platforms in the world were inaccessible for 6 hours straight. Later, in a press release, Facebook revealed that the outage was due to configuration changes in their routers. There is no doubt that Facebook has an intense incident response plan, yet a small blind spot resulted in a significant business interruption. So how do we avoid this? The truth is, outages and performance issues are bound to happen in any network.

PagerDuty Integration Spotlight: HashiCorp Terraform

Manage your PagerDuty account objects with Terraform! Reap all the benefits of infrastructure as code and give your teams the flexibility they need to manage their services in real time. As infrastructure stacks grow increasingly more complex and involve an ever-growing number of services and systems, teams have looked to abstract configuration to its own layer of code. This concept of configuring infrastructure as code is gaining traction throughout the industry for a variety of reasons.

Cloud 66 Feature Highlight: Registered Servers

What are Registered Servers? Registered Servers are a simple way to create a pool of servers on private and public cloud that can be used on any stack and configuration. Applications can be deployed across a hybrid of cloud and registered servers, in this way you could have a dedicated server for your database and burst cloud servers for your front end.

Adding Search to Rails with MeiliSearch

There are many ways to add search functionality to a Rails application. While many Rails developers choose to use the native search functionality built into popular databases like MySQL and Postgres, others need more flexible or feature rich search functionality. ElasticSearch is probably the most well known option available but it has its own issues. Firstly, it is a resource hungry beast. To run ElasticSearch properly in production, you need a few beefy servers.

The Aftermath of the Facebook 6-Hour Outage

Less than 24 hours ago, the world came to a “social standstill” as Facebook, and its sister companies, WhatsApp and Instagram, became unavailable, leaving its 3.5 billion users in a flap. The outage, which lasted almost 6 hours, shut off access for users and businesses all over the world and caused ripple effects that we will likely continue to see in the immediate (and perhaps not-so-immediate) future.

The Future of AIOps Includes an ITOps Strategy

One of the questions I get asked a lot by customers, prospects, and partners is, “Will AIOps make them irrelevant?” To them, AIOps is often equivalent to automated remediation; an AIOps system automatically detects an incident and kicks off a remediation process in response to this incident, knowing exactly what process will solve the problem. IT is out of the loop, data centers and NOCs just keep humming along unattended, end users are none the wiser.