in reply to lazy loading and memory usage

That suggests that when get_all_Genes() is called, the values in the returned array reference are handles to as-yet-unpopulated objects. That is, the anonymous array returned is filled with handles to objects that are, at the point of return, empty. They do not get populated until you call the first method call upon them.

Therefore, if you iterate that array in a for loop, each one gets populated when you call its stable_id() method. So by the end of the for loop, all the genes will have been populated, and as their object handles are still held in the array, all the memory required by all of them will still be in-use.

Conversely, in the while loop, each gene is again populated by the call to the stable_id() method, but because the object handle was shifted off the array, when the loop iterates, that object handle will go out of scope, thereby allowing it and all the memory required to holds its contents to be released.

With the while-shift method, only one gene from the array is ever populated at any given time, so the total memory usage is reduced.

You could achieve the same thing--arguably more clearly--but using undef in conjunction with the for loop:

# Iterate through all of the genes on a clone foreach my $gene ( @{ $first_clone->get_all_Genes() } ) { print $gene->stable_id(), "\n"; undef $gene; ## Free the gene object and the memory it uses. }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: lazy loading and memory usage
by Marshall (Canon) on Dec 19, 2010 at 00:53 UTC
    Nice Post. Would the following achieve the same thing without needing "undef"?
    # Iterate through all of the genes on a clone foreach my $gene ( @{ $first_clone->get_all_Genes() } ) { my $temp_obj_handle = $gene; print $temp_obj_handle->stable_id(), "\n"; }
    Update: Again kudos to BrowserUk.

    Although $temp_obj_handle gets re-cycled and it does "go out of scope", it points to the same place as $gene within the loop and $gene does not go "out of scope". Therefore, the $gene reference count is not, "at the end of the day" decremented. End result: The above code does not save memory within the foreach() loop, although the code from BrowserUk does.

    At the end of day, does this memory optimization within a for() loop matter at all? I think that it usually does not.

    Perl is excellent about re-cycling memory that it has used before. A typical Perl program reaches a max memory usage and then just stays there (provided of course that you don't have memory leaks :-)). There are not any "garbage collection" calls like in Java or C#. In short, this is fine:

    foreach my $gene ( @{ $first_clone->get_all_Genes() } ) { print $gene->stable_id(), "\n"; }
    I would not worry until there are thousands of new objects being created. A few hundred? => no.

    Update again:

    Well as with many things in programming, judgment is required. 5 objects might very well consume 1,483 MB of memory. I have no idea of how much memory that a particular $gene->stable_id() will consume... it might be a lot. On other hand, it might not "be much". This is very application specific. I think this thread has pointed out how the memory allocation works and the OP can decide what to do in a particular situation. I personally would use the most simple loop unless there is a reason not to do so. In other words, make things more complicated when it is necessary to do so. "necessary" is application specific.

      No. Because $temp_obj_handle and $gene both point to the same thing. Once you've populated the object one points at ...

      And as reference one remains in the anonymous array, once the object is populated, it remains populated until

      1. all references to it go out of scope.

        Which is never, as the reference to the anon. array is at the package level.

      2. Or, it is explicitly freed. Eg. undef'd.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      At the end of day, does this memory optimization within a for() loop matter at all? I think that it usually does not.

      ...

      I would not worry until there are thousands of new objects being created. A few hundred? => no.

      Discussion of the number of genes, and whether hundreds or thousands constitutes a number worth concerning about is premature and assumptive until you know how big each gene is!

      Given that:

      1. individual genes can be millions of characters in length--and that's when stored in raw string form without any structuring or associated meta data.
      2. And that the warning the OP is asking about comes from the authors of the module in question who presumably know far more about its internals than we are party to.

      In general, worrying about such an optimisation might be unnecessary, but this warning is not a general warning, but rather a very specific warning about a very particular library, from the people that wrote that library and who therefore are best placed to know.

      I think it would be best to heed such warnings.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.