in reply to memory use when merging tied hashrefs

Why are the hashes tied with DB_File? Are they a persistent datastore from which your program draws input? Does your program produce them, using disk instead of memory to reduce memory footprint? Will these databases grow further?

You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely.

Replies are listed 'Best First'.
Re^2: memory use when merging tied hashrefs
by Anonymous Monk on Nov 14, 2019 at 16:18 UTC
    > Why are the hashes tied with DB_File?

    I was searching thru a lot of files that have to be processed before the search, which worked ok with hundreds of files, but with thousands the search is way faster when everything is preprocessed and stored in a database. DB_File is a very fast core perl module. I love it.

    > Are they a persistent datastore from which your program draws input?

    They are, created and updated, automatically.

    > Does your program produce them, using disk instead of memory to reduce memory footprint?

    The program produces the database to avoid traversing the filesystem to search thru file contents. I guess it uses memory to avoid roaming all over the disk, and it uses some disk space but the DBM_Filter "compress" cuts that in half.

    > Will these databases grow further?

    They may grow automatically when they get accessed, by checking the files they are based on for changes and making updates, before granting read access.

    > You may need to tie %hash3 to another DB_File and copy the contents key-by-key as other monks have suggested, or change your lookup code to check %hash2 and then %hash1 if the key is not found in %hash2 and elminate %hash3 entirely.

    I've been realizing the first technique may be necessary. I prefer your second suggestion but the code resists that solution.

      You could also "change your lookup code" by using Eily's advice and making a tied hash that encapsulates that search across a list of hashes. This would be extensible for adding more hashes as well, if your database grows more "tables" in the future. If the code only does lookups, you should only need to implement a FETCH method. Something like: (untested; the tied object is an array of hashrefs)

      sub FETCH { my $self = shift; my $key = shift; foreach my $hash (@$self) { return $hash->{$key} if exists $hash->{$key} } return undef; }