Re: Context search term highlighting

One common optimization technique is to apply an inexpensive "qualification" test to determine if a more expensive test is necessary. I'm wondering if that might be possible here.

Rather than going through the overhead of splitting a chunk of source text into tokens, check first to see if any search terms are present by using a regexp. The idea is to construct the regexp string "\b(?:word1|word2|...|wordn)\b" from the unique words in your search phrases or their stemmed forms. (You would need to expand '*' into something appropriate.) This regexp is then applied to each chunk of input text. If no match, the chunk of text gets appended directly to output steam. If you get a match, only then do you go through the overhead of breaking up the text into tokens and applying your matching algorithm.

Comment on Re: Context search term highlighting - Perl is too slow Download Code