Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This is a question about proper memory management. I want to make sure that I am not creating a memory leak and that I am using memory as efficiently as possible.

The task that I am working on is one where I need to do a lot of similar things in different scripts. To facilitate this I want to make a file with all the key tasks in it as subroutines and call them when needed from the other scripts. These tasks will use objects from various Perl packages. Since I will only need maybe 10 subroutines, I don't really want to do anything more complex than a file with a collection of subroutines in the same folder as the folder that my scripts are in. I want to keep it simple.

What I have done so far is to create a subrouting file called mynetutils.pl and put the following code in it.

use WWW::Mechanize; sub getfile { my($url, $file) = @_; my $mech = WWW::Mechanize->new(); $mech->get("$url", ':content_file' => "$file"); #delete $mech and test to see if it is deleted #$mech->die; #$mech->delete; #$mech->giberish; $mech->get("$url", ':content_file' => ".\\afterdel.jpg"); } 1;

The code to call it is:

require ".\\mynetutils.pl"; #This downloads a nice pic of a sunset for testing the subroutine &getfile("http://artwall.us/scenic/tropical/images/sunset.jpg",".\\tes +t.jpg");

If you have not noticed yet, this is written for windows and it does work. My concern is that creating the $mech object each time the subroutine is called will will cause a build up of $mech objects in memory. Possible soutions that I have considered are as followes:

1. delete the object after use. The code that I used to test this is in the subroutine file. The attempts to delete the file are all commented out. If you uncomment them (one at a time) and run the code you will see that the second download does not happen. However when I look at the command prompt there is an error:

C:\>test.pl Can't locate object method "delete" via package "WWW::Mechanize" at .\ +mynetutils .pl line 11.

I get the same message and no second download for all three lines when they are uncommented. Since gibberish should not be a defined subroutine, I don't think I am actually deleting the object with either die or delete. So is there a way to kill the object?

2. My second thought is that if I can create a single $mech object and reuse it throughout the execution of the primary script, I could be sure of not cluttering up memory and save time by not recreating the object each run. If I place the my $mech...(declaration and construction line) outside the subroutine, will the object be created and available for all the subroutines to use? Thereby, saving time and memory.

So there is two questions, one last question is can I return this $mech object back to the primary script and manipulate it there and not mess up the memory management. Basically have a reference to the object from the subroutine file and from the primary script at the same time. Sorry if reference is the wrong word here I learned OOP using Java and in Java this would work. I could create the object once in the subroutine folder when the script is first run, when a subroutine is called, return the object after 'loading' it with a download, manipulate it in the primary script, and once finished with that object, call a subroutine to 'reload' the object with another download. So my last question is, "Is this possible"?

Thanks for you time in advance, Chris

Replies are listed 'Best First'.
Re: Memory Managment with Subroutines and Objects
by jethro (Monsignor) on Dec 28, 2010 at 19:49 UTC

    You seem to come from a language like C where you have to do garbage collection and memory management yourself. In most scripting languages the interpreter does that in the background without your intervention. The simple rule it uses is "If there is no reference anymore pointing to some variable or object, reclaim the memory". This means that for example in this snippet

    if ($a==3) { my $f= 'a' x10000; } #no $f, no waste

    $f is referencing a very long string (i.e internally $f holds a reference to a memory location). But after the 'if' clause there is no $f anymore and so no reference to that long string. Ergo the memory gets reclaimed by perl. The same would happen if you just did $f='';. No reference, no memory

    There might still be memory loss because of circular references, but this is a bug and you can tell the module author about it if you find such a case

    Now if you suspect your module or WWW::Mechanize to have such a bug or just want to make sure you could write a test script like you did (good thinking by the way). You just didn't didn't test first if there is a problem at all. Usually tests are written to check for all the worst of cases and make sure that everything works regardless. Specifically a good method is if you find a bug, to first write a test that prints 'fail' because of the bug and then correct the bug until the test prints 'good'.

    So what you could have done is write your test so that he calls $mech->get repeatedly and check the memory consumption. Only if the consumption really goes up would there be any reason to act and find a solution.

      Yes, I was thinking of C style memory management when I asked this question. So now that you have filled me in on how just cutting a object loose will free up the memory, I am good to go. What actually triggered this thought was while reading this article http://search.cpan.org/~jfearn/HTML-Tree-4.1/lib/HTML/Tree/Scanning.pod I seen a line where the author called delete on the treebuilder object and commented that this cleared memory. So I started thinking about memory management. Thanks for your time.

Re: Memory Managment with Subroutines and Objects
by Corion (Patriarch) on Dec 28, 2010 at 18:11 UTC

    Maybe you want to learn about how Perl manages memory? You don't need to manage memory yourself (as you would have seen by monitoring the memory usage of your program as it downloads some files).

    Also, if you want to call a method on an object, it helps to read the documentation as to what methods are supported. Neither ->delete nor ->die nor ->gibberish are documented there to work, so what makes you think they should work?

    As programs do not share memory in a way that both can modify it and see the modifications unless you make very specific preparations, you cannot pass Perl objects between programs (or "scripts" as you call them). Not even in Java can you do that, or at least, it would be just as clumsy as in Perl, by storing all payload data in a file on disk and then restoring the object from that payload data. I'm not sure what you intend to gain from such an approach.

      Thanks for responding. The reason I Thought that delete and die might work is that die is in the documentation as an internal method (so I thought it was worth a try) and I seen delete used in the documentation for HTML::TreeBuilder (so I thought that it may be a method that is common to all perl objects in the same way that there is a set of methods that are inherited by all Java objects from Java's object class). Gibberish was a control on my experiment, as i did not expect it to be defined at all for anything.

      I assume this is a misunderstanding as for the sharing objects between programs. I do not want to do this I just want to call routines from another file and have them return objects (with the added benefit of having a reference to that object in the caller and callee file). This will work in java and is done pretty much as I described. If the code that I provided is saying something else, let me know, I do want to improve my perl understanding

      Again thanks for your time

        Ah, I see now that you are loading the "other script" via require. A more common approach would be to turn the first "script" into a Perl module, by naming it with a .pm extension and then loading it, still via require or use.

        As memory management in Perl is mostly automatic, you don't have to do anything special to release the WWW::Mechanize object (as its documentation does not mention anything special either). If you want to reuse the same WWW::Mechanize object over and over, just store it in a variable at script startup and then use it from there:

        package DownloadUtils; use strict; use vars '$mech'; sub get_mech { WWW::Mechanize->new(); }; $mech = get_mech(); sub get_file { $mech->get("$url", ':content_file' => "$file"); }; 1;

        If you only simply want to download some URLs, LWP::Simple might be the easier approach. But it offers less convenience when it comes to cookies etc.

Re: Memory Managment with Subroutines and Objects
by locked_user sundialsvc4 (Abbot) on Dec 29, 2010 at 13:30 UTC

    It is not strictly true that “cutting loose all references to an object will cause it to be freed.”   When the reference count becomes zero, the memory becomes reclaimable, but it might be some time before the memory is reclaimed.   (And, even when this does occur, the “amount of memory consumed by this process,” as seen by the operating system, might not go down.)

    Usually, it makes the most sense to simply arrange the source code files in some sensible way, such that when Perl has to open one up and parse it, it only has to do so one time and, having done so, it “gets good bang for its buck.”   The time will be lost in opening the file and waiting a few milliseconds for the I/O transfer to take place, not in the handling of memory once this has been done.   (It’s all virtual memory, anyway, and the operating-system paging subsystem can take care of itself.)   Just define Perl objects and put them in .pm files, which you either (probably...) use or (maybe...) require, and in one swell foop you get a nice, easy-to-handle, easy-to-use “software thingy.”

    I’d say, and in the very nicest way possible, that you just might be thinking about this thing too much ... that you might be over-engineering it ... striving to overcome a problem that basically won’t occur.   Just “determine what The Perl Way is,” and then, “do it The Perl Way.”   Or, “c’mon in, the water’s fine.”

      There is one thing that you do need to review, and that is, “weak references.”   See also: Scalar::Util.

      If you construct a “self-referential” data structure, in which everything contains references to everything else, thus forming an “endless chain” of references, you will need to weaken one or more of those references in order to “break the chain.”   In other words, you are “deliberately creating a weak point in the structure,” so that the garbage collector will eventually be able to deduce that the data structure is eligible for reaping when no other references to the data remain other than the data’s own self-references to itself.   Without this (and only in this very specialized situation), a memory-leak can result.