sbank has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

I have some code along the lines of:

$path = "/var/tmp/decode1"; opendir(DIR, $path); @files = grep { /\.dat$/ } readdir(DIR); closedir(DIR); foreach $file (@files) { open(INPUT, "$path/$file") or die "can't open file $file: $!"; # do some stuff close INPUT; }
If I have ~20 files in this directory no problem. But when I work on the order of a couple thousand files I reach my system limit.

Shouldn't the close, close the file descriptor, then when it loops to my next file, open a new file handle? I'm not explicitly forking. I would think that I'm only working on 1 file at a time. (Not all 1000 of them.) I guess this is not the case.

So instead of trying to use threading (which I couldn't get to work properly.), should I just fork children myself then wait for the child to come back? All I want to do is work on one file at a time in a directory of my choice.

TIA

Replies are listed 'Best First'.
Re: Question on fork and close.
by kschwab (Vicar) on May 29, 2001 at 23:57 UTC
    As mentioned in the other comment(s), perhaps @files is too large ? You don't mention what actually happens.

    Try this instead:

    path = "/var/tmp/decode1"; opendir(DIR, $path); while (defined($file=readdir(DIR))) { next unless ($file =~ /\.dat$/); open(INPUT, "$path/$file") or die "can't open file $file: $!"; # do some stuff close INPUT; } closedir(DIR);
    This should help, since readdir() in a scalar context returns just the next entry. If this doesn't help, supplying any error message you are running into would help us to give you a better answer.
      Well long live the error message! It definitely is pointing to my open2 call.
      open2: fork failed: Resource temporarily unavailable at /home/sbank/gz +ip.pl line 74
      Here's more meat from my script. (You can call it a sloppy joe, because of my poor coding.)
      { open(OUTPUT, "$outfile$$") or die "can't open file $outfile$$: $!"; open2(\*GZIP_IN, \*GZIP_OUT, "$gzip -dc -q $outfile$$") or die "can +not open2 $gzip: $!"; until ( eof(OUTPUT) ) { # read in chunks of 1024. read(OUTPUT, $buffer, 1024); print GZIP_OUT $buffer; } close GZIP_OUT; select STDOUT; $| = 1; # make unbuffered while (<GZIP_IN>) { # some other stuff print STDOUT "$_"; } close GZIP_IN; close INPUT; unlink "$outfile"; unlink "$outfile$$"; }
      Line 74 is the open2 line.
        Yeah! If you have a compressed file of roughly more than 8K (depends on your system) gzip stops, your script stops and your processtable looses 2 slots. If no slots are there anymore you get the error. Why doesn't your script stop and only gzip beats me.
Re: Question on fork and close.
by Anonymous Monk on May 29, 2001 at 23:41 UTC
    Could it be that your memory (the @files Array) is exhausted? Or do you "some stuff" with the System (which is it?) where you allocate and never free Resources? Or is it an CGI which needs longer for 10000 files then for 20 and TIMEOUT occurs? Did you account for 20000 files need 500000 times the time which 20 files need for opening?

      I doubt that the @files Array is exhausted. So most likely it is my some stuff that is causing the problem. (This is not in a CGI environment. Just a straight script.)

      I'm guessing that this is the offending line.

      open2(\*GZIP_IN, \*GZIP_OUT, "$gzip -dc -q $outfile$$") or die "cannot + open2 $gzip: $!";

      Again, this should only be one fork (for the one file that gzip is acting upon). At least I think it should be just one fork. (Obviously it isn't. :) )

        Opening a pipe both from and to "gzip -dc" ? Perhaps if you are doing line-mode input from the compressed source, that's the problem:
        while (<WHEREEVERTHECOMPRESSEDDATAIS>) { # $_ could be potentially huge, if it doesn't # contain a newline somewhere soon print GZIP_IN; }
        See read() or sysread() if that's it. But then, that's just a guess...good luck.