Shylock has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks

I'm reading a file of field names and then, for each field checking which file in a directory it's referenced in ie:

$dirName = "/blahdeblah/source"; open (FIELDS, "fields.dat") or die $!; open (FOUND, ">found.dat") or die $!; while (<FIELDS>) { $field = $_; while (defined($nextfile = <$dirName/*cm*.4gl>)) { $rc = system ("grep $field $nextfile 1>/dev/null"); if (!$rc) { print FOUND "$field $nextfile\n"; last; } } }

However, when I retrieve the next field from FIELDS list the glob starts from the point at which it finished on the last run. How can I reset the glob to start from the beginning?

Many thanks for your replies

Edited by Chady -- added code tags.

Replies are listed 'Best First'.
Re: Glob seekpointer
by Tanktalus (Canon) on Feb 25, 2005 at 05:37 UTC

    According to perlop, this is exactly what it does - it will keep going until the end before starting over. Thus, you want something like:

    $rc = 1; while (defined($nextfile = <$dirName/*cm*.4gl>)) { if ($rc) { $rc = system("grep $field $nextfile 1> /dev/null"); if (!$rc) { print FOUND "$field $nextfile\n"; } } }
    But I still don't like it. Use glob rather than the <> operator.
    for my $nextfile (glob File::Spec->catfile($dirName, '*cm*.4gl')) { $rc = system("grep $field $nextfile 1> /dev/null"); if (!$rc) { print FOUND "$field $nextfile\n"; last; } }
    I'd also recommend a few minor other changes:
    • Opening for reading: make it explicit with "<". open(FIELDS,"<fields.dat");. Also, use IO::File. But I just like the objects better.
    • Your while can assign directly to $field: while (my $field = <FIELDS>). Don't use $_ if you're not using $_.
    • Use the array version of system. To do this, you'll need a full path to grep, and you'll need to redirect STDOUT yourself. Arguably not worth the effort (it is a lot of effort if you want to use STDOUT elsewhere). But all it takes is getting bit merely once by a shell interpolation of something you didn't expect to make the benefits all too obvious.
    Hope this helps.

      Thanks, that did the trick.
Re: Glob seekpointer
by tilly (Archbishop) on Feb 25, 2005 at 06:00 UTC
    Tanktalus gave you useful information, but I'd suggest a major change.

    Right now you're clearly thinking like a shell programmer. The algorithm that you're using is buggy and O(n*n). It is buggy because if you have fields named "name" and "namespace", your current code will think that it sees the former in the latter. Plus you may mistake data and metadata. Furthermore the last optimization that you're trying to use only saves you about 50% of the total work that you might wind up doing.

    You can (and should) do better.

    The right approach is to process FIELDS to create a hash of known field names. Then read through your directory, opening each file, parsing out the fieldnames and doing a hash lookup to see whether you're interested in it. If you are then print it to FOUND.

    This is admittedly somewhat more complex to do. However it will only have to make one pass through the directory, and produces far more accurate results. For anything more than a one-off, I'd be sold based on reliability alone. The performance win would just be icing on the cake.

      Yes, I'm fairly new to Perl. Thanks for the advice on better ways of doing things
Re: Glob seekpointer
by BrowserUk (Patriarch) on Feb 25, 2005 at 06:47 UTC

    This might work for you provided your files are each less than a couple of hundred MB. If that's the case it should a bit quicker than your algorithm above.

    #! perl -slw use strict; my( $dir, $glob, $fieldsFile ) = @ARGV; open FIELDS, '<', $fieldsFile or die $!; chomp( my @fields = sort{ length( $b ) <=> length( $a ) } do{ local $/ = "\n"; <FIELDS> } ); close FIELDS; my $re_fields = qr"(@{[ join'|', map{ quotemeta } @fields ]})"; print "@$_" for map { open my $fh, '<:raw', $_ or warn "$_ : $!"; do{ local $/; <$fh> } =~ $re_fields ? [ $1, $_ ] : () } glob "$dir/$glob"; __END__ P:\test>type fields.txt map grep List::Util P:\test>mgrep . *.pl fields.txt List::Util ./200083.pl List::Util ./250802.pl map ./282393.pl map ./306836-trietest.pl map ./308236.pl map ./324513.pl map ./328326.pl map ./330590.pl List::Util ./332124.pl map ./333094.pl map ./333937.pl grep ./334482.pl grep ./335040.pl map ./336125.pl List::Util ./336580.pl ...

    You can also supply CON (or maybe /dev/tty on *nix?) in place of the last parameter to have it take the search terms from the keyboard:

    P:\test>mgrep . *.pl CON min max ^Z max ./200083.pl max ./250802.pl min ./333502.pl min ./335040.pl min ./336125.pl ...

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Glob seekpointer
by mifflin (Curate) on Feb 25, 2005 at 05:28 UTC
    If what you want is to reset the FIELDS file handle to start reading from the begining of the file you seek the seek function

    try putting in a

    seek(FIELDS, 0, 0);
    before the 'last'
      I think that the question is very clear. He wants the glob to restart from the beginning of his directories every time he breaks out of the inner loop. Resetting FIELDS is a useless change that will not help.