petesmiley has asked for the wisdom of the Perl Monks concerning the following question:

I have a program with an enormous array which I'm using as a queue. So I started using Tie::File to get around my memory problem. But oddly it still continues to chew up memory. So I wrote a small snippet to test the problem (see below) Am I extremely confused? Can someone point out where I missed something. I added the flush in the hope that the cache was not working properly, but I didn't notice a big change. It just keeps eating memory.
#!/usr/bin/perl use Tie::File; my @item; my $dbuncheck = tie (@item,'Tie::File','test',memory => '100000') or d +ie $!; for (1..10000) { push @item,'stuff' x 40; $dbuncheck->flush; }

Replies are listed 'Best First'.
Re: Tie::File problem
by pg (Canon) on Jan 17, 2003 at 06:21 UTC
    I wrote a piece of code to help you understand how Tie::File works internally:
    test.pl: use Tie::File; use Data::Dumper; my $self = tie @array, "Tie::File", "test.pl"; print Dumper($self); print $array[2]; print Dumper($self);
    If you run it, the result could be (the sample data is from win 98):
    $VAR1 = bless( { 'autochomp' => 1, 'mode' => 258, 'deferred_s' => 0, 'autodefer_threshhold' => 3, 'sawlastrec' => undef, 'defer' => 0, 'dw_size' => 2097152, 'offsets' => [ 0 ], 'deferred_max' => -1, 'autodeferring' => 0, 'recsep' => ' ', 'rdonly' => '', 'memory' => 2097152, 'filename' => 'test.pl', 'fh' => \*Tie::File::FH, 'ad_history' => [], 'autodefer' => 1, 'deferred' => {}, 'autodefer_filelen_threshhold' => 65536, 'recseplen' => 2, 'cache' => bless( [ bless( [ [ 0, $VAR1->{'cache'}, 0 ] ], 'Tie::File::Heap' ), {}, 2097152, 0 ], 'Tie::File::Cache' ) }, 'Tie::File' ); $VAR1 = bless( { 'autochomp' => 1, 'mode' => 258, 'deferred_s' => 0, 'autodefer_threshhold' => 3, 'sawlastrec' => undef, 'defer' => 0, 'dw_size' => 2097152, 'offsets' => [ 0, '16', '35' ], 'deferred_max' => -1, 'autodeferring' => 0, 'recsep' => ' ', 'rdonly' => '', 'memory' => 2097152, 'filename' => 'test.pl', 'fh' => \*Tie::File::FH, 'ad_history' => [], 'autodefer' => 1, 'deferred' => {}, 'autodefer_filelen_threshhold' => 65536, 'recseplen' => 2, 'cache' => bless( [ bless( [ [ 1, $VAR1->{'cache'}, 1 ], [ 0, 2, ' ' ] ], 'Tie::File::Heap' ), { '2' => 1 }, 2097152, 2 ], 'Tie::File::Cache' ) }, 'Tie::File' );
    The size of the read catch and the deferred write buffer might be big, but their upper limit is under control, and you even can set the size by setting the memory option. This is not our major concern.

    Now let's look at that hash element called 'offsets'. That's where Tie::File stores, by default, the offsets of file lines, if you don't modify the value of recsep.

    Actually the author of this package used memory quite carefully. His code does not populate the offsets array with the offsets of all lines, instead it only populates the offsets of lines up to the last line you accessed.

    In our example, at the beginning, the offsets array only contains one element, after we accessed the third line, it only contains three elements.

    But still this offsets array is growing, especially in your case, as you are using it as a queue, which means the last line you accessed is the last line of the file, so the offsets of all lines are stored in memory.
      Ok, so it is definitely using less memory. So in order to get the complete benefits of this module, I should use it as a stack. That way it is only keeping track of the first few elements.

      Thankfully, that is an option, so I'll give it a try.

      Thanx chiefs

      follow up:
      It worked great :) This snippet doesn't chew hardly any memory. Thanx again.

      #!/usr/bin/perl use Tie::File; my @item; my $dbuncheck = tie (@item,'Tie::File','test',memory => '100000') or d +ie $!; for (1..30000) { unshift @item,'stuff' x 40; }
Re: Tie::File problem
by runrig (Abbot) on Jan 16, 2003 at 23:15 UTC
    The Tie::File object keeps track of where every line starts as it reads the file (in an array reference), so as you 'push' new lines on the end of the file, that array will grow bigger. It has to read every line to get to the end of the file so that it knows at what 'index' the current last line is, so there is an entry in that array for every line in your file. Then this array is a direct index into the lines of your file via seek.

    I don't know how easy it would be to add an option to make Tie::File 'forget' where some of the lines start, but that is what it would take. Maybe there could be an option to 'forget' where the first N lines started, or just keep track of every Nth line, either way, it would take some programming. (Update: Or better yet, an option to not even remember where any line starts until it "needs to know", so you could push to your heart's content and not eat up memory).

    Maybe Tie::File is not the answer for you at this point if all you want to do is add a few lines to the end of a large file.

Re: Tie::File problem
by Aristotle (Chancellor) on Jan 18, 2003 at 15:00 UTC