memory use when merging tied hashrefs

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: memory use when merging tied hashrefs by Eily (Monsignor) on Nov 13, 2019 at 13:33 UTC
It might depend on which version of perl you are using, but one way to avoid using a temporary list (%$hash1, %$hash2 is flattened to a list of pairs before being added to the anonymous hash) you can iterate over the hashes using each. `while (my ($key, $value) = each %$hash1) { $hash3{$key} = $value; }` [download] Same for $hash2. This also depends on how the tied hash is implemented but it's worth a try... Edit: or you could also tie hash3 to fetch data either from hash1 or hash2 ... really depends on what you are trying to achieve and what are your contraints.	[reply] [d/l]
Re^2: memory use when merging tied hashrefs by Anonymous Monk on Nov 13, 2019 at 14:50 UTC
you can iterate over the hashes using each Nice! Memory doesn't budge when using each, but the program's slightly too slow, so I'll trade memory for speed. Good to know for a cloudy day. I did away with hash3 and merged hash2 into hash1: while (my ($k,$v) = each %$hash2) { $hash1->{$k} = $v } Thanks for the lesson	[reply]
Re^3: memory use when merging tied hashrefs by Eily (Monsignor) on Nov 13, 2019 at 17:18 UTC
Another possible middleground solution is to use a for loop on the keys. `$hash3->{$_} = $hash1->{$_} for keys %$hash1;` This potentially means that only the keys will be stored in a temporary list (not sure about the exact impact though, even more so when tied hashes are involded). Also, to improve the speed, you can force perl to preallocate a hash with `keys %hash = 200;`, that's a little bit tricky to use because the right value is the number of buckets in the hash, not the number of keys. See the end of Scalar values about that syntax. To choose the correct number, you can use Hash::Util::bucket_ratio() after doing the job once and getting an idea of the expected size of the hash (do that on the final hash, not a tied one though!). Without that step, perl may first allocate a hash that is too small, and need to copy it into a bigger one when you add too much data. I doubt this happens often with "only" 8000 keys, but it's always worth a try. (Thanks to choroba for helping find the relevant documentation :D)	[reply] [d/l] [select]
Re^3: memory use when merging tied hashrefs by Eily (Monsignor) on Nov 13, 2019 at 15:56 UTC
If $hash1 is tied to a database, does that mean you are adding the records from $hash2 into the first db?	[reply]
Re^4: memory use when merging tied hashrefs by Anonymous Monk on Nov 13, 2019 at 20:22 UTC
Re: memory use when merging tied hashrefs by LanX (Saint) on Nov 13, 2019 at 13:40 UTC
I'm not sure I understand your question ... Let's decompose in simpler steps. This `%H3 = (%H1,%H2)` is a list copy, `%H1` and `%H2` are temporarily flattened to a list of `(key,value)` pairs, joined and used to constuct a new hash. `%H3 = ( k1a => v1a, ... , k2a => v2a ,... )` The `$hash3 = { ... }` part is only taking a ref of the constructed hash list. Not knowing the nature of your tie, this is the best I can tell. You might want to check if flattening creates the special costs observed via tied operations. try `print %$hash2` Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l] [select]
Re: memory use when merging tied hashrefs by jcb (Parson) on Nov 14, 2019 at 00:54 UTC
Why are the hashes tied with `DB_File`? Are they a persistent datastore from which your program draws input? Does your program produce them, using disk instead of memory to reduce memory footprint? Will these databases grow further? You may need to tie `%hash3` to another `DB_File` and copy the contents key-by-key as other monks have suggested, or change your lookup code to check `%hash2` and then `%hash1` if the key is not found in `%hash2` and elminate `%hash3` entirely.	[reply] [d/l] [select]
Re^2: memory use when merging tied hashrefs by Anonymous Monk on Nov 14, 2019 at 16:18 UTC
> Why are the hashes tied with DB_File? I was searching thru a lot of files that have to be processed before the search, which worked ok with hundreds of files, but with thousands the search is way faster when everything is preprocessed and stored in a database. DB_File is a very fast core perl module. I love it. > Are they a persistent datastore from which your program draws input? They are, created and updated, automatically. > Does your program produce them, using disk instead of memory to reduce memory footprint? The program produces the database to avoid traversing the filesystem to search thru file contents. I guess it uses memory to avoid roaming all over the disk, and it uses some disk space but the DBM_Filter "compress" cuts that in half. > Will these databases grow further? They may grow automatically when they get accessed, by checking the files they are based on for changes and making updates, before granting read access. > You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely. I've been realizing the first technique may be necessary. I prefer your second suggestion but the code resists that solution.	[reply]
Re^3: memory use when merging tied hashrefs by jcb (Parson) on Nov 14, 2019 at 23:19 UTC
You could also "change your lookup code" by using Eily's advice and making a tied hash that encapsulates that search across a list of hashes. This would be extensible for adding more hashes as well, if your database grows more "tables" in the future. If the code only does lookups, you should only need to implement a `FETCH` method. Something like: (untested; the tied object is an array of hashrefs) `sub FETCH { my $self = shift; my $key = shift; foreach my $hash (@$self) { return $hash->{$key} if exists $hash->{$key} } return undef; }` [download]	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.


Just another Perl shrine
	PerlMonks