Investigating TCP timeouts at scale
At Mattermost, we’re on a quest to scale our application by one order of magnitude, from tens of thousands to hundreds of thousands of concurrently active users per installation. Scaling up is a complex effort involving expertise at several different levels. At its core, it’s a game of catching the next bottleneck — whether it’s application CPU usage, memory consumption, database throughput, networking, or any combination of the above (among other causes as well).