in reply to Re: Issue with cloning and large structure processing
in thread Issue with cloning and large structure processing

Ok, I will try but it's difficult because of length and logic of my code. First of all - what does my script do? It takes one line of structured string, parses that string and stores it in hash that looks like:
 $VAR1 = {
          '0' => {
                   'type' => ...,
                   'value' => ...
                 },
          '1' => {
                   'type' => ..,
                   'value' => ...
                 },
          '2' => {
                   'type' => ...,
                   'value' => ...
                 },
          '3' => {
                   'type' => ...,
                   'value' => ...
                 },
          '4' => {
                   'lvalue' => ...,
                   'rvalue' => ...,
                   'rvalue_type' => ...,
                   'type' => ...
                 }
        };
The aim of script is to generate more (similar in some way) hashes. I store these hashes in array. The main action is generation and it looks like:
    my $elements_ref = shift; # reference to AoH which at the beginning contains one element
    
    %el = %{$elements_ref->[0]}; # Take first element of AoH
    my @hitlist; # Stores hash keys that indicate hash elements I will change
    
    # ...
    # Fill @hitlist with hash keys that I need. Simple for loop.
    # ...
    
    my $k = @hitlist; # Size of @hitlist. In this case it is equal to 3
    
    my $iter = Algorithm::Combinatorics::variations_with_repetition($new, $k); # Give me all variations of some elements
    
    while (my $var = $iter->next)       
    {
        my $new_el= Storable::dclone \%el;
        
        # Now, for each variation I will create (clone) original %el
        # and in $new_el I will substitute $k elements of hash indicated
        # by keys contained in @hitlist
        
        for (my $j = 0; $j < $k; $j++)
        {   
            my $hit = $hitlist$j;
            
            # ...
            $new_url->... = $var->$j;
            # ...
        }
        
        push @$elements_ref, $new_el; # Store new element in AoH
    } 
And the problem is that when at the end I store AoH in hdd it takes 8mb, but while generating it takes 130mb of ram.
  • Comment on Re^2: Issue with cloning and large structure processing

Replies are listed 'Best First'.
Re^3: Issue with cloning and large structure processing
by BrowserUk (Patriarch) on Apr 10, 2010 at 11:41 UTC
    • What's in $new?
    • How are you saving the AoH to disk?
    • Are you sure you are saving everything you are generating?
    • How are you measuring the size in memory?
    • If the process is completing, why are you concerned with how much memory it took?

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      1. In most cases it is array of ints e.g. 0..25. The variations are fetched with iterator so this is not the bottleneck
      2. Storable::store
      3. Yes
      4. htop, system monitor. It isn't accurate but the difference is big
      5. Because in some cases can be more hashes and for example it takes 3gb of ram and 600mb of hdd. It's too much memory.

        Hm. All I can do at this point is confirm your findings.

        Constructing an @AoH of 5000 copies of %ENV in as memory efficient way as I know how, reports ~9MB in memory and ~9MB on disk, but constructing it uses 40MB of ram. I see no sign of leaks from Storable.

        c:\test>perl -MDevel::Size=total_size -MStorable=dclone,store -E"$#AoH=5e3; $AoH[$_] = dclone \%ENV for 0..$#AoH-1; <>; print total_size( \@AoH ); store \@AoH, 'junk.dat'; <>" 9555355 c:\test>dir junk.dat 10/04/2010 13:20 9,090,025 junk.dat

        And loading it back into memory still reports ~9MB for the structure, but it requires 40MB to construct it:

        c:\test>perl -MDevel::Size=total_size -MStorable=dclone,retrieve -E"$AoH = retrieve 'junk.dat';<>;print total_size( $AoH );<>" 9842331

        I don't know why that is, but I have a cogent speculation. It is the same problem that used to afflict Devel::Size, (still does afflict the "official" version as far as I'm aware!), and still afflicts Data::Dumper and other similar modules.

        That of using a hash internally to track which scalars have already been seen, so that when the data is restored, duplicated references to the same sub-structures are re-created as such, and not duplicated as identical but different copies. Whilst the purpose of these tracking hashes is required for the module(s) to work correctly, they frequently grow to many times the size of the structures that are being serialised/deserialised.

        The code I implemented in my version of Devel::Size, to perform this tracking function uses a fraction of the memory and is much faster. It could be used by Storable (and the other modules mentioned above) to great affect. But it would be a brave (and possibly foolhardy) man to do this a a patch, because the changes would be pervasive, and the testing requirements paramount. It would require the (up-front) blessings (and assistance) of the module owners to have a chance at succeeding.

        I could break out the tracking code from Devel::Size and make it available as some kind of library. If anyone was going to use it.

        The only quick solution I can offer, is that instead of storing/retrieving the entire AoH as a single Storable entity, you store and retrieve it as a single file containing a series of smaller Storable objects.

        As the overall size of each Storable creation would be a fraction of the total size, the size of the tracking hash would be also be much smaller. And by saving and loading in a series of calls to Stroable, the memory for the internal tracking hash would be re-used at each iteration, thereby further reducing the overall memory requirement hugely.

        If this approach appeals to you, I could put together some code to demonstrate it?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.