Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Fellow Monks:
I am in serious need of your wisdom and help.
The goal is to list the contents of a directory (no problem)
and do something with the files that contain,match via grep to specific patterns; one that will always be at the start of the file, and the other somewhere in the middle of the filename. In other words I want to match a file, for example, that has to contain STR1 at the beginning, and _STR2_ (the underscores are required), somewhere in the middle.
With the correct regex:
STR1some_STR2_file.txt would match
.whereas.
STR1some_STR2file.txt would NOT match
with the code below I have managed to get the qualifier for the beginning of the string to be matched, however to have both matches I have no clue.
#!/usr/bin/perl -w use strict; use FileHandle; my $localdir="C:\\PROJECTS\\"; my $id="LWX1"; opendir LOCALDIR, "$localdir"; my @local_files = grep /^$id/i, readdir LOCALDIR; ###my guess fr both would be this: (doesn't work) ###where "_arbv_" is the 2nd string that must be matched #my @local_files = grep /^$id&&_arbv_/i, readdir LOCALDIR; closedir LOCALDIR; ##determine which files to download foreach my $local (@local_files) { print "$local\n"; }
Your help in this matter is greatly appreciated. For I am stumped with this one! Thanks in advance.

Replies are listed 'Best First'.
Re: Perl Regexp help w/ grep
by Roy Johnson (Monsignor) on Oct 14, 2005 at 17:11 UTC
    For that, I'd say go with a file glob rather than explicit opendir/readdir/grep.
    my @local_files = glob('STR1*_STR2_*');

    Caution: Contents may have been coded under pressure.
Re: Perl Regexp help w/ grep
by VSarkiss (Monsignor) on Oct 14, 2005 at 17:40 UTC
Re: Perl Regexp help w/ grep
by injunjoel (Priest) on Oct 14, 2005 at 17:08 UTC
    Greetings all,
    Here is my suggestion.
    #the following is untested. my @local_files = grep /^$id.*?_arbv_/i, readdir LOCALDIR;
    The . matches any character. The * matches the previous character 0 or more times and the ? keeps it from being too greedy (well in concert with * that is) So it matches the shortest amount possible not the longest.

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
      the ? keeps it from being too greedy

      I'm curious, what is the purpose of including this in a simple match? Seems to me that it would only make sense if it was a capturing match or a search and replace. Not that there's anything wrong with it, it just seems a bit pointless.

        Perhaps it's an attempt at optimization? A greedy RE will match the entire filename from that point on and then have to back track character by character to determine if there is or isn't a match. By making the RE non-greedy, matching filenames will be found faster.