Operations | Monitoring | ITSM | DevOps | Cloud

Incident Review For the Facebook Outage: When Social Networks Go Anti-social

The following is an analysis of the Facebook incident on 10/4/2021. Marking a highly unusual state of events, Facebook, Instagram, WhatsApp, Messenger, and Oculus VR were down simultaneously around the world for an extended period of time Monday. The social network and some of its key apps started to display error messages before 16:00 UTC. They were down until 21:05 UTC, when things began to gradually return to normality.

How Netflix, Lyft, Slack, And Other Top Tech Brands Manage Cloud Costs

Breakthroughs in engineering best practices often stem from a handful of top tech companies. Many of them share their behind-the-scenes stories at conferences, in blogs, and slide decks — or open source code. These companies invest millions of dollars and dedicated headcount in optimizing everything from uptime to engineering velocity — so why wouldn’t you look to them for inspiration?

The Blog Is Dead; Long Live the Blog

Ever since the very beginning, Honeycomb has poured a lot of heart and soul into our blog. We take pride in knowing it isn’t just your typical stream of feature updates and marketing promotions, but rather real, meaty pieces of technical depth, practical how-to guides, highly detailed retrospectives, and techno-philosophical pieces. One of my favorite things is when people who aren’t customers tell me how much they love our blog.

Leverage Correlation Analysis to Address the Challenges of Digital Payments

In the first four parts of our series on correlation analysis, we discussed the importance of this capability in root cause analysis in a number of business use cases, and then specifically in the context of promotional marketing, telco and algorithmic trading. In this blog we walk through how to leverage correlation analysis to address the challenges in ensuring a seamless online payment experience by the end-user.

Deploy the RESTMon Microservice in Minutes

Within any enterprise, IT operations teams use a variety of solutions to monitor their technology ecosystem. These products are often business critical and cannot easily be replaced or migrated. Ultimately, it’s important that teams can analyze and correlate data from these different tools so they can produce the insights they need to improve decision making. To help address these requirements, Broadcom offers RESTMon.

Data Centers & the Impending Water Crisis - 5 Experts to Follow

‘Power-hungry data centers greedily devour diminishing water supply!’ – This might be my own frequency bias, but I feel like every headline today references data centers and the existential threat they pose to the environment. Perhaps it’s the sensational numbers and estimates that pull me in: So here I am, uncomfortably deep in the rabbit hole, reading about adiabatic processes and wet cooling towers (experts in the U.K.