probably a little better would be to use \b and \w instead of defining a word and separator from sratch, or even use the posix equivalents for more clarity.
also a further suggestion to the original question, an alternative would be to read the file in the desired chunks/units to start with, each time through the loop reading one extra word from the file into this comparison buffer and dropping one word off at the end of the buffer. In other words a sliding window/buffer technique with word being step size.
the hardest line to type correctly is: stty erase ^H