in reply to Issue with cloning and large structure processing

Your going to have to show us a bit more of your code--like where does %some_el come from?--because on the face of it, 130MB is too big for a hash constructed from 8MB of data.

This creates an 8MB file of keys and values, loads them into a hash, and the total size is just 12MB:

c:\test>perl -E"printf qq[%014d: %014d\n], $_, $_ for 1..262144" >junk +.dat c:\test>dir junk.dat 10/04/2010 11:06 8,388,608 junk.dat c:\test>perl -MDevel::Size=total_size -E"local$/; my %h = split ': ', <>; print total_size \%h;" junk.dat 12489744

Of course, if the 8MB contains more than just a flat hash structure, then the memory requirement will be more, but 10x more is stretching the imagination a bit. So, it probably comes down to what else you are doing in your code. Real code is always more likely to result in a resolution than pseudo-code.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

Replies are listed 'Best First'.
Re^2: Issue with cloning and large structure processing
by scathlock (Initiate) on Apr 10, 2010 at 11:02 UTC
    Ok, I will try but it's difficult because of length and logic of my code. First of all - what does my script do? It takes one line of structured string, parses that string and stores it in hash that looks like:
     $VAR1 = {
              '0' => {
                       'type' => ...,
                       'value' => ...
                     },
              '1' => {
                       'type' => ..,
                       'value' => ...
                     },
              '2' => {
                       'type' => ...,
                       'value' => ...
                     },
              '3' => {
                       'type' => ...,
                       'value' => ...
                     },
              '4' => {
                       'lvalue' => ...,
                       'rvalue' => ...,
                       'rvalue_type' => ...,
                       'type' => ...
                     }
            };
    
    The aim of script is to generate more (similar in some way) hashes. I store these hashes in array. The main action is generation and it looks like:
        my $elements_ref = shift; # reference to AoH which at the beginning contains one element
        
        %el = %{$elements_ref->[0]}; # Take first element of AoH
        my @hitlist; # Stores hash keys that indicate hash elements I will change
        
        # ...
        # Fill @hitlist with hash keys that I need. Simple for loop.
        # ...
        
        my $k = @hitlist; # Size of @hitlist. In this case it is equal to 3
        
        my $iter = Algorithm::Combinatorics::variations_with_repetition($new, $k); # Give me all variations of some elements
        
        while (my $var = $iter->next)       
        {
            my $new_el= Storable::dclone \%el;
            
            # Now, for each variation I will create (clone) original %el
            # and in $new_el I will substitute $k elements of hash indicated
            # by keys contained in @hitlist
            
            for (my $j = 0; $j < $k; $j++)
            {   
                my $hit = $hitlist$j;
                
                # ...
                $new_url->... = $var->$j;
                # ...
            }
            
            push @$elements_ref, $new_el; # Store new element in AoH
        } 
    
    And the problem is that when at the end I store AoH in hdd it takes 8mb, but while generating it takes 130mb of ram.
      • What's in $new?
      • How are you saving the AoH to disk?
      • Are you sure you are saving everything you are generating?
      • How are you measuring the size in memory?
      • If the process is completing, why are you concerned with how much memory it took?

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        1. In most cases it is array of ints e.g. 0..25. The variations are fetched with iterator so this is not the bottleneck
        2. Storable::store
        3. Yes
        4. htop, system monitor. It isn't accurate but the difference is big
        5. Because in some cases can be more hashes and for example it takes 3gb of ram and 600mb of hdd. It's too much memory.