shoness has asked for the wisdom of the Perl Monks concerning the following question:

Learned Monks,

I have a Perl array that I've filled with interesting things I want to look for within a file. I can't figure out how to perform this search more elegantly than with a loop that looks for each item in turn with brute force:

#!/usr/local/bin/perl -w use strict; my @names = ("john", "paul", "george", "ringo"); while(<STDIN>) { chomp; my $hit = 0; foreach my $name (@names) { if (m/^static\s+\w+\s+(${name})\W.*/) { $hit = $1; last; } } if ($hit) { print "Beatle method \"$hit\" found on this line: $_\n"; } else { print "No Beatle method found on this line: $_\n"; } }
If my array always had four members, I could optimize the code above to something like this:
while(<STDIN>) { chomp; if (m/^static\s+\w+\s+($names[0]|$names[1]|$names[2])\W.*/) { print "Beatle method \"$1\" found on this line: $_\n"; } else { print "No Beatle method found on this line: $_\n"; } }
The array can have one or many entries however. Ideally I'd like a one-liner like above, but for any size array:
# Note that this line isn't supposed to compile... if (m/^static\s+\w+\s+(???@names???)\W.*/) {
FYI... Data like this:
static int lars(...); static bool george(...); static int thom(...);
Should produce output like this:
No Beatle method found on this line: static int lars(...); Beatle method "george" found on this line: static bool george(...); No Beatle method found on this line: static int thom(...);
Thanks for your help!

Replies are listed 'Best First'.
Re: RegExp to Search All Array Members?
by RMGir (Prior) on Jun 19, 2007 at 09:34 UTC
    You were SO close.
    my $regexStr="^static\s+\w+\s+(" # Thanks moritz! . (join "|",map quotemeta,@names) . ")\W.*)"; while(<STDIN>) { chomp; if(/$regexStr/) { # rest is the same as your "optimized" # attempt

    Mike

    Edit: added map quotemeta as suggested by moritz

      Having just read the section in Damian's Object Oriented Perl book about the regular use of qr, this looked like a good chance to try it out. That is, with qr, you are creating a reference to a regular expression which you can store in a scalar variable. When that scalar variable is used in a regex later on, the complilation work to prepare the referenced regex doesn't have to be repeated each time through your loop, avoiding a good amount of regex overhead.

      Here's the snippet again with some tweaks to your example. Note, whenever I store a reference in a scalar, I like to suffix the identifier with _Xr, where X stands for the kind of thing that's supposed to be referenced (in this case r for "regex") and the closing r indicates that the scalar is supposed to hold a reference.

      #!/usr/local/bin/perl -w use strict; my @names = ("John?", "Paul", "George", "Rin.go"); my $regexStr = "^static\\s+\\w+\\s+(" # Thanks moritz! . (join "|",map quotemeta,@names) . ")\\W.*"; print "$regexStr\n\n"; my $regexStr_rr = qr{$regexStr}i; # or "cloister" the 'i' in $regexStr while(<DATA>) { chomp; my ( $hit ) = $_ =~ /$regexStr_rr/; if ($hit) { print "Beatle method \"$hit\" found on this line: $_\n"; } else { print "No Beatle method found on this line: $_\n"; } } __DATA__ static int lars(...); static bool george(...); static int thom(...);

        I have tried to get this working, but have been unsuccessful. I am a noob to programming and perl so patience would be appreciated.

        The search NEVER matches. I made sure the case is exactly the same in the doc and still no luck

        Here is the code

        #!/usr/bin/perl -w use strict; use warnings; open(FH, "Wachterpdf2txt.txt") or die "We have a problem: $!"; my @invoiceSearch = ("phone", "hunt", "dial", "tone", "static", "18d", + "system", "voice", "numbers", "voicemail", "MLX201", "programming", +"extension", "processor", "block", "mls", "programmed", "rollover", " +extension", "partner", "crosstalk", "merlin", "ringing"); my $regexStr="^static\\s+\\w+\\s+(" . (join "|",map quotemeta,@invoice +Search) . ")\\W.*"; print "$regexStr\n\n"; my $regexStr_rr = qr{$regexStr}i; # or "cloister" the 'i' in $regexStr while(<FH>) { chomp; my ( $hit ) = $_ =~ /$regexStr_rr/; if ($hit) { print "Beatle method \"$hit\" found on this line: $_\n"; } else { print "No Beatle method found on this line: $_\n"; } }

        What I can't seem to find anywhere is what the ^static statement means in the Regex variable. I am pulling in a text document that was converted from .pdf I have stripped out all spaces and most punctuation. The text document looks fine to me.

        Any help would be appreciated!

      An alternative to using join and string concatenation when making the pattern would be to change the default list separator from a space to the pipe symbol and use interpolation with the array or list.

      $ perl -Mstrict -Mwarnings -le ' > my @arr = qw{abc d.ef gh?i}; > my $patt; > { > local $" = q{|}; > $patt = qq{xyz(@{ [ map quotemeta, @arr ] })123}; > } > print $patt;' xyz(abc|d\.ef|gh\?i)123 $

      You can also use this method with qr{...} which behaves like double quotes.

      Cheers,

      JohnGG

Re: RegExp to Search All Array Members?
by duff (Parson) on Jun 19, 2007 at 12:26 UTC