Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Out of Memory Error -- Possible Leak?

by graff (Chancellor)
on Dec 14, 2005 at 04:05 UTC ( [id://516514]=note: print w/replies, xml ) Need Help??


in reply to Out of Memory Error -- Possible Leak?

The scenario implies that the HoHoA is already loaded at the point that you go into this three-level loop over the data structure, and you obviously have not run out of memory at that point. So there must be something about how you are trying to access the data structure that is causing perl to consume a lot more memory than was needed to store the original data structure.

You say the structure is "not sparse", but if loading the structure happens to take up, say, 80% of available memory, then you might end up going over the limit if you auto-vivify a relatively small percentage of previously non-existent cells in the overall structure.

I wonder if it would help to try the loop like this:

for (my $i = 0; $i < scalar(@array); $i++) { my $subhash = $hash{$array[$i]}; # does not autovivify new $hash +element next unless $subhash; # equivalent to "exists($hash{$array +[$i]})" for (my $j = $i; $j < scalar(@array); $j++) { my $subarray = $$subhash{$array[$j]}; # same comments a +s above next unless $subarray; foreach my $value ( @$subarray ) { # do great deeds } } }
If the OP code actually works on a given (smaller) data set, it might turn out that this different form of the nested looping could produce the same output in less time (but I haven't tested that).

The point of this approach is that you can easily check whether a given hash element exists before trying to use its contents as an array ref, and skip it if it does not exist; this involves extra steps within the loops, but these could end up saving a lot of execution time overall. And if they save enough unnecessary memory consumption as well (making the difference between crashing and finishing), speed may be a lesser concern.

update: I think the code as suggested above would be "good enough", if you're confident about how the data structure was being created, but I would be tempted to be more careful:

next unless ( ref( $subhash ) eq 'HASH' ); ... next unless ( ref( $subarray ) eq 'ARRAY' ); ...

Replies are listed 'Best First'.
Re^2: Out of Memory Error -- Possible Leak?
by Anonymous Monk on Dec 14, 2005 at 06:50 UTC

    Hello, thanks for the suggestions!

    I think I should have been specific -- when I said "not sparse", I actually meant that the data structure had been fully created in advance with a loop exactly like this. I've added in the checks, and before the program the program crashes (about 3 hours of CPU) it does not every autovivify.

    One thing that goes on inside the #do great deeds is to change values in the Array part of the HoHoA, and I had expected that to be the cause of my memory leak. I'm really confuzzled as to why this:

    foreach my $value ( @$subarray ) {

    would eat up memory. I would have guessed it would use one scalar's worth (perhaps 15 bytes of data), and that this memory would get reused over & over again. Instead, the script runs on this loop for a few hours before using up all the memory.

      I'm really confuzzled as to why this:

      foreach my $value ( @$subarray ) {

      would eat up memory.

      That line expands the contents of the array into a list (Ie. On Perls' stack). If the subarray pointed by $subarray is large, it will consume a large amount of memory to do that.

      You can avoid that by indexing the array, rather than aliasing it's elements. Eg.

      for my $index ( 0 .. $#$subarray ) { my $value = $subarray->[ $index ]; ... }

      Or, if you need to modify the elements, substitue $subarray->[ $index ] wherever you are currently using $value.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        That line expands the contents of the array into a list (Ie. On Perls' stack). If the subarray pointed by $subarray is large, it will consume a large amount of memory to do that.

        But... but... wait a minute. I thought that in this sort of loop:

        for $value ( @array ) { # or ( @$arrayref ) # do something with $value }
        the "$value" is just being used as a "magical" reference to the actual values in the array. If you do something like $value++ inside the loop, the values of the original array are incremented in place -- the increment doesn't apply to copies of the array elements.

        So, why would that sort of loop consume extra memory? Why would it make a separate list from @array (or from the actual array pointed to by @$arrayref), when $value is being used as a reference to each array element in turn? (If it really does take up extra memory, then I'm just making clear how little I understand about the underlying implementation of for loops in perl, and if someone explains it to me, I'll probably be better off at some point...)

        The dialog so far leads me to think that the AM has some other trap or "gotcha" inside that nested loop, which hasn't been shown yet, and which is probably causing some horribly unintended consequence (e.g. something like the now-infamous sprintf bug).

        (updated to add link to Format string vulnerability)

        Another update: Okay, your reference to "Perl's stack" is probably the part I hadn't understood before: in order to process (queue-up) the elements to be iterated over, the for loop has to push the whole set of elements onto a stack. But then, I would presume that in the AM's situation, with that particulat for loop being done so many times, Perl would be re-using that stack space.

        Obviously, if the array in question is really big, the stack could boil over -- OOM -- at the first attempt on the inner-most for loop. This seems consistent with the reported symptoms, and iterating over an array index instead of array values might fix it.

        Okay, I'm curious: does this matter? In my case the foreach loop is going over arrays that have ~100 elements. In this case, I would have expected the maximum memory usage to be 100 x (element-size + over-head). That doesn't seem too large, but it assumes that all memory is getting smoothly cleared after the foreach loop is finished. is that a valid assumption?

Re^2: Out of Memory Error -- Possible Leak?
by Anonymous Monk on Dec 16, 2005 at 01:14 UTC

    YES YES YES!!!

    No matter how sure I was about the data-structure being properly created, I decided to out code like inside my loop:

    if ( exists( $data{$array[$i]}{$array[$j]} ) ) { next(); } else { print OUT join("\t", $i, $j), "\n"; }

    and... wouldn't ya know it, there *were* elements being autovivified. I've fixed that problem, and now the code appears to be running correctly (appears = its been running for a day without crashing yet -- it still has another couple of days to run, though).

    Many thanks -- this has led me to one key problem in the code. Any monks who haven't done so yet, please ++ the parent.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://516514]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (None)
    As of 2024-04-25 01:21 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found