Rather than going through the overhead of splitting a chunk of source text into tokens, check first to see if any search terms are present by using a regexp. The idea is to construct the regexp string "\b(?:word1|word2|...|wordn)\b" from the unique words in your search phrases or their stemmed forms. (You would need to expand '*' into something appropriate.) This regexp is then applied to each chunk of input text. If no match, the chunk of text gets appended directly to output steam. If you get a match, only then do you go through the overhead of breaking up the text into tokens and applying your matching algorithm.
In reply to Re: Context search term highlighting - Perl is too slow
by dws
in thread Context search term highlighting - Perl is too slow
by moseley
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |