I appreciate the technique, but the keywords are selected from the English language (and proper nouns), and the number of keyworded items could be in the tens of thousands, so I could not use any technique that limited the epsilon to 255~256 symbols.
I'll be adding more information and test cases shortly.