in reply to Context search term highlighting - Perl is too slow
Rather than going through the overhead of splitting a chunk of source text into tokens, check first to see if any search terms are present by using a regexp. The idea is to construct the regexp string "\b(?:word1|word2|...|wordn)\b" from the unique words in your search phrases or their stemmed forms. (You would need to expand '*' into something appropriate.) This regexp is then applied to each chunk of input text. If no match, the chunk of text gets appended directly to output steam. If you get a match, only then do you go through the overhead of breaking up the text into tokens and applying your matching algorithm.
|
|---|