biggin777 has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I need some help. I need a program that will go to directory's, look in them for a file's that match a certain criteria, then look in those matching files for multiple things. That is the part I am having a problem with. example.
opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!"; while ($file = readdir(<DIR>)) { if ($file =~ /Somthing/){ open(FILE, $file); #Now I need it to match 3 things that #will be on different lines in the file. #The words PASS,sweeps,Final.Also #these files are big(100mb), so its not a good #idea to suck them into an array and grep, #unless it's my last choice. close(FILE); } close(DIR); }

Replies are listed 'Best First'.
Re: Perl Matching Question
by BrowserUk (Patriarch) on Sep 10, 2003 at 07:01 UTC

    Replace your comment block with something like

    while( <FILE> ) { print "Found $1 in $file; line $. :offset $-[0]" if /(PASS)/ or /(sweeps)/ or /(Final)/; }

    You don't say what you want to do if you find a match, or whether a match means you found any of your three words or all of your three words. The above will do the former but breaking the if into 3, setting a flag for each and taking some action (like leaving the loop early) if all 3 flags are set would be one way to do the later.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Perl Matching Question
by shenme (Priest) on Sep 10, 2003 at 06:53 UTC
    While debugging throw lots of print's in and try things out.   That would tell you that $file is going to be just the filename within the directory, not the whole path you need in your open(FILE,....   But then you forgot the or die which would tell you the same thing.
    opendir(DIR, $some_dir) || die "can't opendir '$some_dir': $!"; while ($file = readdir(<DIR>)) { if ($file =~ /Somthing/){ my $fullpath = "$some_dir/$file"; open(FILE, $fullpath) or die "can't open file '$fullpath' for + reading: $!"; while(<>) { if( m/PASS/ ) { print $_; } } # look around ... I'm sure you can find examples # of reading files line by line close(FILE); } close(DIR);
Re: Perl Matching Question
by gjb (Vicar) on Sep 10, 2003 at 08:29 UTC

    BrowserUK's solution implies that you want to do something with a file if it contains one of the words 'PASS', 'sweeps', 'Final', but from your post, I take it a file should match all three of them. The following code snippet should do that.

    open(FILE, "$some_dirr/$file") or die("can't open $file"); my ($PASSmatch, $sweepsmatch, $Finalmatch); while (<FILE>) { if (/\bPASS\b/) { $PASSmatch = 1; } elsif (/\bsweeps\b/) { $sweepsmatch = 1; } elsif (/\bFinal\b) { $Finalmatch = 1; } } close(FILE); if ($PASSmatch && $sweepsmatch && $Finalmatch) { # do whatever with this file. }
    This is not going to win a prize for elegance or generality, but it should be along the lines you want.

    Hope this helps, -gjb-

    Update: graff makes two good points about the code above: 1) one shouldn't die in a directory scan when a file can't be read and 2) it would be more efficient to last out of the while as soon as the three words have been found for efficiency's sake.
      I agree with your reading of the post (I wonder if biggin777 does too...), and with your approach (well, erm, if you're scanning through a directory, then it seems a bit harsh to die because open fails on a given file).

      Anyway, since the files are big, it would be nice to exit the while loop ASAP. Granting that all three conditions need to be met to trigger further processing, there's no point in keeping track of them separately:

      my @keepers; opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!"; while ($file = readdir(<DIR>)) { if ($file =~ /Somthing/ and -f $file and open(FILE, $file) { my $pass = 0; while (<FILE>) { $pass++ if (/\b(?:PASS|sweeps|Final)\b/); last if ( $pass == 3 ); } close FILE; push @keepers, $file if ( $pass == 3 ); } } # now do whatever needs to be done with @keepers.
      (or maybe something needs to be done with @keepers in that same while loop? but that might complicate things a lot; perhaps there'll be another question from biggin777 about that in a little while...)

      update: Thanks to AM's very astute reply below, I see where it might be important to keep track of each different condition separately. To keep it brief, I would just set a different bit for each condition:

      ... my $pass = 0; while (<FILE>) { $pass |= 1 if ( /\bPASS\b/ ); $pass |= 2 if ( /\bsweep\b/ ); $pass |= 4 if ( /\bFinal\b/ ); if ( $pass == 7 ) { push @keepers, $file; last; } } ...

        With this code if you put PASS sweeps FINAL on the same line, it fails. Also, you do not designate which words you've seen. As such, the file could say PASS on each line and the program will accept it. Too bad the files are so big as you could otherwise have one line inside your while ($file ... loop:

        push @keepers, $file if $file=~/Something/ and -f $file and 3 == @{ [ +do { my @a= (my $temp = do{local(*ARGV,$/)=[$file];<>}) =~ /\b(PASS|s +weeps|Final)\b/g;my %b;undef @b{@a};keys %b } ] };

        Yeah, no error checking or anything. :)

        Anonymously yours
        Anonymous Monk

Re: Perl Matching Question
by Roger (Parson) on Sep 10, 2003 at 06:34 UTC
    You could try to use the external program grep(assume that you are working on unix platform). Then just do this:

    $result = `/usr/xpg4/bin/grep -e PASS -e sweeps -e Final -q $filename` +;
    if the result is empty, then the file does not contain the patterns you are looking for. Otherwise, it will return non-empty. The -q option will short circuit the grep so that grep will quit after the first match, this will improve the performance of grep.
Re: Perl Matching Question
by nite_man (Deacon) on Sep 10, 2003 at 06:47 UTC

    Try to look at module File::Find::Rule. I think it will help you to resolve your problem.

    _ _ _ _ _ _
      M i c h a e l