Re: memory use when merging tied hashrefs
by Eily (Monsignor) on Nov 13, 2019 at 13:33 UTC
|
It might depend on which version of perl you are using, but one way to avoid using a temporary list (%$hash1, %$hash2 is flattened to a list of pairs before being added to the anonymous hash) you can iterate over the hashes using each.
while (my ($key, $value) = each %$hash1)
{
$hash3{$key} = $value;
}
Same for $hash2. This also depends on how the tied hash is implemented but it's worth a try...
Edit: or you could also tie hash3 to fetch data either from hash1 or hash2 ... really depends on what you are trying to achieve and what are your contraints. | [reply] [d/l] |
|
while (my ($k,$v) = each %$hash2) { $hash1->{$k} = $v }
Thanks for the lesson | [reply] |
|
Another possible middleground solution is to use a for loop on the keys.
$hash3->{$_} = $hash1->{$_} for keys %$hash1;
This potentially means that only the keys will be stored in a temporary list (not sure about the exact impact though, even more so when tied hashes are involded).
Also, to improve the speed, you can force perl to preallocate a hash with keys %hash = 200;, that's a little bit tricky to use because the right value is the number of buckets in the hash, not the number of keys. See the end of Scalar values about that syntax. To choose the correct number, you can use Hash::Util::bucket_ratio() after doing the job once and getting an idea of the expected size of the hash (do that on the final hash, not a tied one though!). Without that step, perl may first allocate a hash that is too small, and need to copy it into a bigger one when you add too much data. I doubt this happens often with "only" 8000 keys, but it's always worth a try.
(Thanks to choroba for helping find the relevant documentation :D)
| [reply] [d/l] [select] |
|
| [reply] |
|
Re: memory use when merging tied hashrefs
by LanX (Saint) on Nov 13, 2019 at 13:40 UTC
|
I'm not sure I understand your question ...
Let's decompose in simpler steps.
This
%H3 = (%H1,%H2)
is a list copy, %H1 and %H2 are temporarily flattened to a list of (key,value) pairs, joined and used to constuct a new hash.
%H3 = ( k1a => v1a, ... , k2a => v2a ,... )
The $hash3 = { ... } part is only taking a ref of the constructed hash list.
Not knowing the nature of your tie, this is the best I can tell.
You might want to check if flattening creates the special costs observed via tied operations.
try print %$hash2
| [reply] [d/l] [select] |
Re: memory use when merging tied hashrefs
by jcb (Parson) on Nov 14, 2019 at 00:54 UTC
|
Why are the hashes tied with DB_File? Are they a persistent datastore from which your program draws input? Does your program produce them, using disk instead of memory to reduce memory footprint? Will these databases grow further?
You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely.
| [reply] [d/l] [select] |
|
> Why are the hashes tied with DB_File?
I was searching thru a lot of files that have to be processed before the search, which worked ok with hundreds of files, but with thousands the search is way faster when everything is preprocessed and stored in a database. DB_File is a very fast core perl module. I love it.
> Are they a persistent datastore from which your program draws input?
They are, created and updated, automatically.
> Does your program produce them, using disk instead of memory to reduce memory footprint?
The program produces the database to avoid traversing the filesystem to search thru file contents. I guess it uses memory to avoid roaming all over the disk, and it uses some disk space but the DBM_Filter "compress" cuts that in half.
> Will these databases grow further?
They may grow automatically when they get accessed, by checking the files they are based on for changes and making updates, before granting read access.
> You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely.
I've been realizing the first technique may be necessary. I prefer your second suggestion but the code resists that solution.
| [reply] |
|
You could also "change your lookup code" by using Eily's advice and making a tied hash that encapsulates that search across a list of hashes. This would be extensible for adding more hashes as well, if your database grows more "tables" in the future. If the code only does lookups, you should only need to implement a FETCH method. Something like: (untested; the tied object is an array of hashrefs)
sub FETCH {
my $self = shift;
my $key = shift;
foreach my $hash (@$self)
{ return $hash->{$key} if exists $hash->{$key} }
return undef;
}
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |