How to Reliably Block AI Crawlers Using HAProxy Enterprise
The robots.txt file is a time-honored point of control for website publishers to assert whether or not their websites should be crawled by bots of various kinds. However, it turns out that AI crawlers from large language model (LLM) companies often ignore the contents of robots.txt and crawl your site regardless.