in reply to Perl regex question

The previous suggestion to use a module is fine, provided you read and understand the module documentation, so that you know what it's doing for you.

But for a case like this, a simple, direct use of the information in perlre, and the description of "qr" in perlop, would do just as well. You want to make sure that you don't get "false-alarm" matches, like having a match on a target word like "part" when the text file contains "compartment". So you want to enclose your list of words to match in parens, surrounded by word-boundaries, like this:

my @target_words = qw/list of words/; # or read from a file, or whate +ver my $joined_targets = join "|", @target_words; my $match_regex = qr/\b($joined_targets)\b/; for my $file ( @file_list ) { open( F, "<", $file ); while (<F>) { if ( /$match_regex/ ) { my $mathed_word = $1; store_to_db( $file, $matched_word ); last; } } close F; }
Of course, if a target word like "work" is supposed to match on tokens like "works" and/or "worked", you should just make sure your list includes all the appropriate forms for each word.

Replies are listed 'Best First'.
Re^2: Perl regex question
by repellent (Priest) on Mar 08, 2009 at 18:58 UTC
    It may be better to treat the words literally rather than as patterns:
    my $joined_targets = join "|", map { quotemeta } @target_words;
      As often as not, treating the strings as patterns is more sensible than treating them as literals. It depends on what the programmer wants to accomplish with a particular app, so the programmer should make this a deliberate choice for each app.