lagle has asked for the wisdom of the Perl Monks concerning the following question:

I have to iterate over huge text massives with information about different files, actually about phonetical information about human speech.

The file is ordered, so i try to iterate over it until there is a change of file name, then i want to deal with the information collected about that one file

does my use of splice of the array of hashes free up allocated memory used for "one" file, so that i can use it for the next one? Or will the hashrefs be orphaned?

Here's a sample code that uses the same technique:

use strict; use warnings; my @words = (); my @cvs = (); # push some hash data onto the arrays, in real life this would be # huge files with phonetical transcripton, not VC toilets push @cvs, { file => 1, text => 'v' }; push @cvs, { file => 1, text => 'c' }; push @cvs, { file => 1, text => 'c' }; push @cvs, { file => 2, text => 'c' }; push @cvs, { file => 2, text => 'v' }; push @cvs, { file => 2, text => 'c' }; push @cvs, { file => 2, text => 'c' }; push @words, { file => 1, text => 'üks' }; push @words, { file => 2, text => 'kaks' }; my $cv_as_text; my @cvs_to_delete = (); # for each word we have to find its consistents for my $w_i (0 .. $#words) { my %word = %{$words[$w_i]}; my $cv_as_text = ''; my @cvs_to_delete = (); print "$w_i: $word{text}\n "; # now loop through the consistents and find correct match for my $c_i (0 .. $#cvs) { my %cv = %{$cvs[$c_i]}; # in real life this is done with micro seconds, not file nr if( $cv{file} == $word{file}) { $cv_as_text .= $cv{text}; push @cvs_to_delete, $c_i; print $cv{text}; } # we don't want to search to the end (data is ordered) elsif ($cv{file} > $word{file}) { last; } } print "\n"; # now delete all cvs we extracted @cvs_to_delete = reverse sort @cvs_to_delete; for my $del (@cvs_to_delete) { splice (@cvs, $del, 1); } }

Replies are listed 'Best First'.
Re: does splice'ing an array of hashes free memory?
by kcott (Archbishop) on Nov 01, 2010 at 09:29 UTC

    The documentation for splice indicates that "The array grows or shrinks as necessary." Whether actual memory is freed is another matter. Devel::Size provides functions for determining the memory usage of variables.

    -- Ken

Re: does splice'ing an array of hashes free memory?
by JavaFan (Canon) on Nov 01, 2010 at 10:13 UTC
    Yes, but splicing isn't a cheap operation. In fact, if @cvs_to_delete contains x% of the elemens of @cvs (for arbitrary x), splicing them out 1 by 1 has an expected quadratic running time.

    It's better to use a slice:

    my %cvs_to_delete = map {($_, 1)} @cvs_to_delete; @cvs = @cvs[grep {!$cvs_to_delete{$_}} 0..$#cvs];

      you're right that it's expensive. but the grep method is even slower, because the arrays are ~30000 long, and the information i "extract" from them is somewhere near the beginning of the array and their length is somewhere between 1-15. the grep method iterates to the end of the array which is what i try to prevent

      my first version just splice'd it the amount of $#cvs_to_delete, it ran quick, but i can't be sure it works.

      i'll try to make something that splices all consequtive rows in one sweep.

        here's the code i ended up for splicing subsequent indexes in one go. i have to say it saves me literally hours.

        # now we'll pop away what we found from the AoH # basically we splice out subsequent indexes in one go # and the poor lonelies have to go home alone # if nothing is to be popped, we don't pop unless (scalar @i_to_delete) { return $text; } @i_to_delete = sort {$b <=> $a} @i_to_delete; # descending order my $length = 1; my $offset = undef; for my $i (0 .. $#i_to_delete) { if ($i < $#i_to_delete) { # not the last index if (($i_to_delete[$i] - $i_to_delete[$i+1]) == 1) { # the two +are subsequent $length++; } else { # they are not subsequent, split current $offset = $i_to_delete[$i]; splice(@$AoH, $offset, $length); $length = 1; } } else { # last index $offset = $i_to_delete[$i]; splice(@$AoH, $offset, $length); $length = 1; } }
Re: does splice'ing an array of hashes free memory?
by moritz (Cardinal) on Nov 01, 2010 at 09:14 UTC
    does my use of splice of the array of hashes free up allocated memory used for "one" file, so that i can use it for the next one?

    Yes.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: does splice'ing an array of hashes free memory?
by ikegami (Patriarch) on Nov 01, 2010 at 15:57 UTC

    People are answering about the size of the array, but you seem to be asking the hashes.

    Or will the hashrefs be orphaned?

    A variable that becomes orphaned (i.e. whose refcount becomes zero) is freed automatically.

    Splicing one of the hash references from the array frees the reference, which reduce the refcount on the hash, which frees the hash.

      thank you! and this seems to be correct because perl only occupies up to 40 MiB while executing, even though the file it iterates is well above that (appr. 1.5 GiB).