DenairPete has asked for the wisdom of the Perl Monks concerning the following question:

I need the wisdom of PerlMonks!!! I am running a script that opens a directory and puts files that end in .html into an array. The directory contains 1 million files total, with about half of them having the .html extension. When I run my script I get "Out of Memory": Here is how I am getting pushing the files into the array:

opendir(DIR, $accumulatorDir) or die "$!\n"; my @jrnFiles = map $_, grep /\.html$/, readdir DIR; closedir(DIR);

Is there another alternative I can use that is semi-efficient? Java has no problem doing this with their "java.io.File.listFiles"

Replies are listed 'Best First'.
Re: Using READDIR runs out of memory
by afoken (Chancellor) on Mar 20, 2018 at 19:21 UTC
    I am running a script that opens a directory and puts files that end in .html into an array. The directory contains 1 million files total, with about half of them having the .html extension. When I run my script I get "Out of Memory":

    I would not expect that to happen. Perl should easily handle an array containing a million records. Anyway, another aproach would be to iterate over the directory. Something like this:

    opendir my $d,$dirname or die "Could not open $dirname: $!"; while (defined (my $item=readdir $d)) { $item=~/\.html$/ or next; work_on_the_item($item); } closedir $d;

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks Alexander! I dont believe its the Array storage thats the problem. It's the READDIR

        I dont believe its the Array storage thats the problem.

        Well, I don't believe in the FSM, but it may still exist after all.

        Why don't you simply test if iterating solves the problem?

        It's the READDIR

        It's called readdir, not READDIR. And its behaviour is very different when used in scalar context instead of list context. RTFM.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Using READDIR runs out of memory
by Marshall (Canon) on Mar 20, 2018 at 20:46 UTC
    I'm not sure what you mean to do with the map?

    To get only names ending in .html, I would do this: my @jrnFiles = grep {/\.html$/} readdir DIR; Perl should be fine with 1 million files in the directory.

Re: Using READDIR runs out of memory
by Anonymous Monk on Mar 20, 2018 at 19:18 UTC
    Why are you using map to increase memory usage ? For further savings use while loop and push instead of grep