Building a scalable, easy-to-use web crawler for Elastic Enterprise Search
Indexing the web is hard. There’s a nearly infinite supply of misbehaving sites, misapplied (or ignored) standards, duplicate content, and corner cases to contend with. It’s a big task to create an easy-to-use web crawler that’s thorough and flexible enough to account for all the different content it encounters.