Great suggestion on the Regexp::List module. I hadn't investigated it before. I'm impressed with how it optimizes the list to minimize costly alternation. Efficiency seems to have been one of the primary design philosophies.
Does anyone know if there is a PPM3 build of it anywhere? I didn't find it on the ActiveState repositories. I would love to play with it.
I toyed with another solution that turns the problem upside down by putting the keywords in a hash, pulling out individual words one by one from the file, and checking for the existance of a given word in the keyword hash. For large keyword lists it could prove more efficient than pure simple alternation since hash lookups occur in O(1) time:
use strict; use warnings; my %keywords; @keywords{ 'keyword1', 'keyword2', 'keyword3' } = (); while( <DATA> ){ chomp; while( m/\b([\w'-]+)\b/g ) { print "'$_' contains keyword: $1\n" if exists $keywords{ $1 }; } } __DATA__ a line with keyword2 in it a line with keyword1 and keyword3. a line with no keywords. keyword1 can start a line too. and a line can end in keyword2
Enjoy.
Dave
In reply to Re^3: searching for keywords
by davido
in thread searching for keywords
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |