in reply to Re^2: Count number of occurrences of a list of words in a file
in thread Count number of occurrences of a list of words in a file

Thank you for the compliment, and I'm glad that my suggestion was helpful to you.

I'm curious about your ultimate solution. As pointed out by Veltro here, the methods used by Athanasius, Tux and myself for enumeration of potential words are essentially identical. The differences in approach are between the split/exclusion and regex/extraction (as I would characterize them) methods used for finding candidate "words." Were you able to define a $rx_word regex object that had relatively few false positives (and, of course, absolutely no false negatives)? If so, it would be interesting to know what this definition is if it isn't so specific to your application as to be meaningless to others, or too proprietary.

It would be of even greater interest to me if you were able to get the Building Regex Alternations Dynamically approach working and if it is advantageous in terms of speed. As I mentioned in my reply (now with more updates!), my expectation was that a 60K word list was too big to be encompassed by a regex alternation; I no longer believe this. If you were able to use this technique and it proved beneficial, I'd like to hear about it!


Give a man a fish:  <%-{-{-{-<