in reply to Imploding URLs

A simplification of the brute force method would be to split on \W, and use a hash to count the frequencies of the "words". That avoids generating the canonical list of substrings from every URL (which is a huge list, even for a single URL of significant length).

-QM
--
Quantum Mechanics: The dreams stuff is made of