Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

memory use when merging tied hashrefs

by Anonymous Monk
on Nov 13, 2019 at 13:27 UTC ( [id://11108628]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a program that deals with two hashes. $hash1 has 800 keys and $hash2 has 8000 keys. I noticed the memory use spikes for a bit when merging the hashes:

$hash3 = { %$hash1, %$hash2 }

Is it the deref that eats memory? The hashes are each tied to a different DB_File, maybe there's a better way? Thanks

Replies are listed 'Best First'.
Re: memory use when merging tied hashrefs
by Eily (Monsignor) on Nov 13, 2019 at 13:33 UTC

    It might depend on which version of perl you are using, but one way to avoid using a temporary list (%$hash1, %$hash2 is flattened to a list of pairs before being added to the anonymous hash) you can iterate over the hashes using each.

    while (my ($key, $value) = each %$hash1) { $hash3{$key} = $value; }
    Same for $hash2. This also depends on how the tied hash is implemented but it's worth a try...

    Edit: or you could also tie hash3 to fetch data either from hash1 or hash2 ... really depends on what you are trying to achieve and what are your contraints.

      you can iterate over the hashes using each

      Nice! Memory doesn't budge when using each, but the program's slightly too slow, so I'll trade memory for speed. Good to know for a cloudy day. I did away with hash3 and merged hash2 into hash1:

      while (my ($k,$v) = each %$hash2) { $hash1->{$k} = $v } 
      
      Thanks for the lesson

        Another possible middleground solution is to use a for loop on the keys. $hash3->{$_} = $hash1->{$_} for keys %$hash1; This potentially means that only the keys will be stored in a temporary list (not sure about the exact impact though, even more so when tied hashes are involded).

        Also, to improve the speed, you can force perl to preallocate a hash with keys %hash = 200;, that's a little bit tricky to use because the right value is the number of buckets in the hash, not the number of keys. See the end of Scalar values about that syntax. To choose the correct number, you can use Hash::Util::bucket_ratio() after doing the job once and getting an idea of the expected size of the hash (do that on the final hash, not a tied one though!). Without that step, perl may first allocate a hash that is too small, and need to copy it into a bigger one when you add too much data. I doubt this happens often with "only" 8000 keys, but it's always worth a try.

        (Thanks to choroba for helping find the relevant documentation :D)

        If $hash1 is tied to a database, does that mean you are adding the records from $hash2 into the first db?

Re: memory use when merging tied hashrefs
by LanX (Saint) on Nov 13, 2019 at 13:40 UTC
    I'm not sure I understand your question ...

    Let's decompose in simpler steps.

    This

     %H3 = (%H1,%H2)

    is a list copy, %H1 and %H2 are temporarily flattened to a list of (key,value) pairs, joined and used to constuct a new hash.

     %H3 = ( k1a => v1a, ... , k2a => v2a ,... )

    The $hash3 = { ... } part is only taking a ref of the constructed hash list.

    Not knowing the nature of your tie, this is the best I can tell.

    You might want to check if flattening creates the special costs observed via tied operations.

    try print %$hash2

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: memory use when merging tied hashrefs
by jcb (Parson) on Nov 14, 2019 at 00:54 UTC

    Why are the hashes tied with DB_File? Are they a persistent datastore from which your program draws input? Does your program produce them, using disk instead of memory to reduce memory footprint? Will these databases grow further?

    You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely.

      > Why are the hashes tied with DB_File?

      I was searching thru a lot of files that have to be processed before the search, which worked ok with hundreds of files, but with thousands the search is way faster when everything is preprocessed and stored in a database. DB_File is a very fast core perl module. I love it.

      > Are they a persistent datastore from which your program draws input?

      They are, created and updated, automatically.

      > Does your program produce them, using disk instead of memory to reduce memory footprint?

      The program produces the database to avoid traversing the filesystem to search thru file contents. I guess it uses memory to avoid roaming all over the disk, and it uses some disk space but the DBM_Filter "compress" cuts that in half.

      > Will these databases grow further?

      They may grow automatically when they get accessed, by checking the files they are based on for changes and making updates, before granting read access.

      > You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely.

      I've been realizing the first technique may be necessary. I prefer your second suggestion but the code resists that solution.

        You could also "change your lookup code" by using Eily's advice and making a tied hash that encapsulates that search across a list of hashes. This would be extensible for adding more hashes as well, if your database grows more "tables" in the future. If the code only does lookups, you should only need to implement a FETCH method. Something like: (untested; the tied object is an array of hashrefs)

        sub FETCH { my $self = shift; my $key = shift; foreach my $hash (@$self) { return $hash->{$key} if exists $hash->{$key} } return undef; }
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11108628]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-26 04:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found