Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I want to search through a file for some keywords...and if a line within the file contains the keyword..I want it to print out that line to another file.
I am trying the following but it gives the error: syntax error at test.pl line 9, near ") print" Execution of test.pl aborted due to compilation errors.
@keywords = ("keyword1", "keyword2", "keyword3"); while (<SEARCHFILE>) { if ($_ = @keywords) print $_; } Contents of File1: somelines... somelines... somewords...keyword1..somewords somelines... somewords... keyword2...somewords...
I am very new to Perl...can someone please help me with this? Thank you.

Replies are listed 'Best First'.
Re: searching for keywords
by bobf (Monsignor) on Jan 18, 2006 at 03:04 UTC

    The syntax error is because you don't have a block for your 'if' statement, only for the 'while'. You could get by without the block if you turned it into a modifier (see perlsyn), as follows:

    print $_ if( $_ =~ m/$keyword/ );

    There's more to it than that, though, because you're using the assignment operator ('=', see perlop) in what appears to be a pattern match, hence the modification I made to your code in the above example. If @keywords is large, use Super Search to find examples of using an array in a pattern match. If it is reasonably small, you can use alternation (perlre):

    my $pattern = join( '|', @keywords ); if( $_ =~ m/$pattern/ )

    Oh, and in addition to using strict and warnings, you may also find diagnostics helpful.

    HTH

    ikegami is absolutely right about escaping special characters, of course (see quotemeta). Thanks and ++ for pointing that out!

    Update: full code example, with escaped characters

      m/$keyword/
      is wrong. You need to escape special characters. A simple way of doing this is
      m/\Q$keyword/

      my $pattern = join('|', @keywords );
      has the same problem. Use
      my $pattern = join('|', map quotemeta, @keywords);
      instead.

      If the list of words is long, you can speed things up a lot by using Regexp::List:
      my $pattern = Regexp::List->new->list2re(@keywords);

      All together, we get:

      use Regexp::List (); my @keywords = ("keyword1", "keyword2", "keyword3"); my $pattern = Regexp::List->new->list2re(@keywords); #my $pattern = join('|', map quotemeta, @keywords); # Alternative while (<SEARCHFILE>) { if ($_ =~ $pattern) { # or just: if (/$pattern/) { print $_; # or just: print; } }

        Great suggestion on the Regexp::List module. I hadn't investigated it before. I'm impressed with how it optimizes the list to minimize costly alternation. Efficiency seems to have been one of the primary design philosophies.

        Does anyone know if there is a PPM3 build of it anywhere? I didn't find it on the ActiveState repositories. I would love to play with it.

        I toyed with another solution that turns the problem upside down by putting the keywords in a hash, pulling out individual words one by one from the file, and checking for the existance of a given word in the keyword hash. For large keyword lists it could prove more efficient than pure simple alternation since hash lookups occur in O(1) time:

        use strict; use warnings; my %keywords; @keywords{ 'keyword1', 'keyword2', 'keyword3' } = (); while( <DATA> ){ chomp; while( m/\b([\w'-]+)\b/g ) { print "'$_' contains keyword: $1\n" if exists $keywords{ $1 }; } } __DATA__ a line with keyword2 in it a line with keyword1 and keyword3. a line with no keywords. keyword1 can start a line too. and a line can end in keyword2

        Enjoy.


        Dave