Dynamically switching between in-memory and disk-tied data structures

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
I am writing a generic peg board (solitaire) solver. There is some CS theory that I haven't researched yet, so for now this is brute force. The issue I am facing is balancing speed and memory since this is a generic solver and can't be optimized for a specific case.

The current method I intend to use starts out requiring only very little memory, peaks half way through the solution, and then tapers off. It looks a little bit like this:

my $old_work = get_initial_work(); # hash reference
my $new_work = {};
while (%$old_work) {
    for my $item (keys %$old_work) {
        my $job = delete $old_work->{$item};
        for (new_work($job)) {
            $new_work{$item . $_} = $_;
        }
    }
    ($new_work, $old_work) = ({}, $new_work);
}
[download]

While I can easily determine the upper boundary memory requirement for any board, I can't know for sure how close to that boundary the code will actually get. One way to approach this would be to allow the user indicate the maximum amount of memory they are willing to spend and if the upper boundary exceeds that amount - resort to BerkeleyDB. This is an "all or none" approach.

I can improve upon this a bit because I can calculate the upper boundary memory requirement for each pass through the work process. I can switch between in-memory and BerkeleyDB as soon as I know I might exceed the user-defined threshold and then revert as soon as I know that I won't. This is a "per-pass" approach and is the strategy I plan on using at the moment.

What I would like to do however is make the switch only if I actually do approach the limit. So I would need to know the memory consumed by ($old_work + $new_work) and have a way of switching to BerkeleyDB. For now I am knowingly ignoring lots of details such as Perl not necessarily freeing memory to the OS. What I am interested in knowing is:

Is there a sane way of determining memory consumption of a running program from within that program?
Will Devel::Size tell me how much memory just $old_work and $new_work are using?
Is there a smart way to load all they key/val pairs of an existing hash ref into a BerkeleyDB when it is initialized?

The reason I would like to do this is that many board configurations are likely impossible due to the constraints of the game. Only valid configurations are visited so the actual memory consumption may never approach the upper boundary and keeping things in memory will certainly be faster. Thanks in advance for your time and consideration in this matter.

Cheers - L~R

Comment on Dynamically switching between in-memory and disk-tied data structures Download Code

Replies are listed 'Best First'.
Re: Dynamically switching between in-memory and disk-tied data structures by BrowserUk (Patriarch) on Nov 07, 2006 at 17:59 UTC
Using Devel::Size for this is likely to cause problems because it uses a fairly substantial amount of memory itself when calculating the size of nested data structures. Much in the same way that Data::Dumper does, it uses it to eliminate cyclical references. I'd advise just using the cachesize parameter (via BerkeleyDB::Env class ) and set it as large as possible without pushing the process into swapping. It will probably do a better job of balancing performance than you can. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Dynamically switching between in-memory and disk-tied data structures by perrin (Chancellor) on Nov 07, 2006 at 17:02 UTC
I really think you're better off just using BerkeleyDB all the time if you know you'll have to use it sometimes. It will buffer the data in shared memory as much as possible, so while it won't be as fast as a perl hash, it won't be as slow as writing to disk either.	[reply]
Re: Dynamically switching between in-memory and disk-tied data structures by Anonymous Monk on Nov 07, 2006 at 17:11 UTC
Since you know who you are ($$), you can always ask the system how much memory you are consuming. For instance, using 'ps', or my peeking in '/proc' (depending on your system). Devel::Size can tell you how much memory a variable uses - but note that if such a variable is an inside-out object, more memory can be associated with that variable then Devel::Size reports.	[reply]