neilwatson has asked for the wisdom of the Perl Monks concerning the following question:

Greeting Brothers,

I have a file whose size is X bytes. I'd like to read the file into an array (1 line is 1 array element). Easy enough. However, I do not want the array to be greater than 24 kilobytes in size. If the size is greater than that, the script should continue reading the contents into a new array.

In the end I should have X number of arrays, all less than 24kB, whose combined contents is equal to the original file.

Neil Watson
watson-wilson.ca

  • Comment on Controlling the Size of an Array based on its file source.

Replies are listed 'Best First'.
Re: Controlling the Size of an Array based on its file source.
by Joost (Canon) on May 29, 2002 at 13:37 UTC
    If you mean controlling the size of the data in the array, you can do something like:

    my @arrays = ( [] ); my $length = 0; while (<>) { if (length $_ + $length >= 24 * 1024) { push @arrays,[$_]; $length = length $_; } else { push @{$arrays[-1]},$_; $length += length $_; } }
    which will fill @arrays with anonymous arrays smaller than 24 Kbytes worth of data (Except when the lines are bigger than 24 Kbytes by themselves) .

    Controlling the actual size of the arrays (in terms of allocated memory) is quite a lot trickier (if at all possible) in pure perl.

    Another question would of course be: why would you want this anyway?

    -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: Controlling the Size of an Array based on its file source.
by broquaint (Abbot) on May 29, 2002 at 13:42 UTC
    However, I do not want the array to be greater than 24 kilobytes in size.
    I assume here you mean that when the file is written to disk it will take up no more than 24kb as the actual size of the arrray in memory will be greater than 24kb due to the extra information perl associates with any data.

    If you're going for exact sizes and you're not too worried about the lines you could just set the $/ var to 24576 and read the file in normally e.g

    open(my $fh, "mahoosive_file") or die("ack - $!"); my @chunks; { local $/ = \24576; push @chunks, $_ while <$fh>; }
    Now each element of @chunks will contains a chunk of the file upto 24kb in size. However if you must maintain the lines then you'll have to do some funky file pointer re-positioning
    ... { # NOTE: code is untested local $/ = \24576; while(<$fh>) { my $chunk = $_; my $last_rs = rindex($chunk, $/) push @chunks, substr($chunk, 0, $last_rs); seek($fh, 1, -(length($chunk) - $last_rs)); } }

    HTH

    _________
    broquaint