foreach vs. while inside a directory tranversing subroutine

locked_user TomJerry has asked for the wisdom of the Perl Monks concerning the following question:

This is a very simple test program for traversing a directory.

I got two versions, the only difference is listing a directory --- one uses while(<*>), the other uses foreach(). Both should work.

However, strange things happened. The 1st is always trapped in a infinite looping, while the 2nd works well. Why?

 # 1st form
sub traverse_file
{
    my $dir = shift;
    if (-d $dir) {
        while (<$dir/*>) {
            traverse_file($_);
        }
    } else {
        print "$dir\n";
    }   
}
[download]

 # 2nd form
sub traverse_file
{
    my $dir = shift;
    if (-d $dir) {
        my @subdirs=<$dir/*>;
        foreach (@subdirs){
            traverse_file($_);
        }
    } else {
        print "$dir\n";
    }   
}
[download]

Replies are listed 'Best First'.

Re: foreach vs. while inside a directory traversing subroutine
by Athanasius (Archbishop) on May 26, 2015 at 04:45 UTC

Hello TomJerry, and welcome to the Monastery!

Consider the following quote from I/O Operators:

A (file)glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In list context, this isn’t important because you automatically get them all anyway. However, in scalar context the operator returns the next value each time it’s called, or undef when the list has run out.

Although this doesn’t explain why the while version of sub traverse_file produces an infinite loop, it is suggestive: All values must be read before it will start over. It looks as though the while version — which evaluates the glob function in scalar context, but makes a recursive call before the original list of filenames has been exhausted — is exhibiting what in C would be called “undefined behaviour.”

I think the documentation could be clearer on this point, but the moral is apparent: use glob in scalar context only if the list it returns will be exhausted before the next call to glob. In all other cases, use glob in list context, as in the foreach version of sub traverse_file. This is always the safer option.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: foreach vs. while inside a directory tranversing subroutine

by locked_user TomJerry (Initiate) on May 26, 2015 at 06:25 UTC

It seems so. We need be more careful about glob or <> aftewards. Thanks for excellent explanation!

Re: foreach vs. while inside a directory tranversing subroutine
by wrog (Friar) on May 26, 2015 at 03:48 UTC

traverse_file

print

traverse_file($_);
[download]

print STDERR "> $_\n";
exit 0 if ++$count > 100;
traverse_file($_);
print STDERR "<\n";
[download]

our $count=0

[reply]
[d/l]
[select]

Re: foreach vs. while inside a directory tranversing subroutine
by locked_user sundialsvc4 (Abbot) on May 26, 2015 at 11:36 UTC

Not directly related to the foregoing (and, promptly up-voted) explanation, I’d like to add two tangental comments with regard to the general subject of traversing directory structures:

(1) Particularly in the Windows operating-system, I have over the years encountered various problems such as limits on the number of directory searches you could actually successfully do at one time. (Deeply nested directory structures started throwing OS-errors as, apparently, Windows ran out of a limited resource. Explicitly closing the cursors was also very important.) Therefore, it was necessary to use a “to-do list” of my own making. But in any case, you don’t have to futz with things like that, since there are a-plenty of known-good routines (e.g. File::Find) in CPAN, which have been tested against very deep structures. You probably should use these routines in preference to your own, especially if you deal with Windows. You can encounter “gotchas,” otherwise, which are nothing more than time-and-money wasters.

(2) Also in the same vein, and also on Windows, I have encountered unexpected problems with directory searches (skipping entries, finding entries more than once, OS errors) when I attempted to do something with the files that were found (and especially, anything-at-all to do with directories), within the traverse loop. Therefore, I categorically advise, “first do one, then do the other.” Traverse to accumulate a to-do list, then separately process the list. That always does the job without interference.

Re^2: foreach vs. while inside a directory tranversing subroutine

by Anonymous Monk on May 26, 2015 at 12:00 UTC

File::Find ... tested against very deep structures.

How deep exactly are the directory structures that File::Find is tested against?

[reply]