More tests than you shake a memory stick at

Summary

I wanted to run a Test::More script that would ultimately execute over 10_000_000 tests. It died, however, after about 8_000_000 because it ran out of memory. After some investigation, I found that Test::Builder retains a record for every test run, and this is likely why my test died.

In this meditation I look at a few solutions to this problem.

Background (What I was really trying to do.)

At $work, we have a multi-terabyte NFS mounted storage pool with millions of files, each with a record in the application's database. I wrote a few audit tools to confirm (1) that each file in storage has a record in the database, (2) that each record in the database has a file in storage, and (3) that the md5 has for the file in storage matches the one in the database. It also does some other sanity checking.

I thought it would be a good idea to (ab)use standard testing tools to write this. It could output TAP and run under Test::Harness. It would be easier to automate a "quick" day-long sanity check.

From the perspective of the testing framework, there are multiple tests per file. Each test verifies the correctness of some property of the files and their relationship to the database.

Planning a lot of tests (The opening of hostilities.)

This is actually pretty easy. I open the database and ask it how many files there are supposed to be. Then I use that for my plan.

use Test::More;
use File::Find;
use DBI;

my $dbh = DBI->connect( ... );
my ($file_count) = $dbh->selectrow_array( 'SELECT count(*) FROM t' );
plan 'tests' => $tests_per_file * $file_count;

find({ wanted => \&verify, follow_fast => 1 }, $storage_dir );
diag( "It's normal to run more tests than planned because files have b
+een created since the records were counted" );
[download]

Method 1: Change Test::Builder (Plead for mercy.)

I filed a change request, but my expectations are pretty low. Having looked into the code a little, I think this change is easier said than done.

Method 2: Use the disk. (tie to DBM::Deep.)

I didn't actually try this, but I'm pretty sure it would work.

# before testing.
my $results_db = 'test_results.db';
if ( ! unlink $results_db && -e $results_db ) {
    die "Can't unlink existing results db '$results_db': $!";
}
my $db = tie my @test_results, 'DBM::Deep', 'test_results.db';
Test::More->builder->{Test_Results} = \@test_results;
[download]

This should cause the test results to go to the test_results.db file on disk instead of hogging memory. When testing is over, you'll want to unlink that file.

The elements of Test::More->builder->{Test_Results} are hash references, so my first choice of Tie::File wouldn't work.

Method 3: Delete test results (Lie to the framework.)

Out of millions of tests, I expect maybe a few hundred fails. All the successes are more or less the same to me. So maybe I can make an array where every success is the same success. Let there be only one success and let every subsequent success be merely a reference to that one.

package Tie::StdArray::TestResults;
use Tie::Array;
@Tie::StdArray::TestResults::ISA = ( 'Tie::StdArray' );

use List::Util qw( first );

sub default_STORE { $_[0]->[$_[1]] = $_[2] }

sub STORE {
    my ( $self, $index, $val ) = @_;

    return &default_STORE    if ref $val ne ref {};
    return &default_STORE    if ! $val->{ok};

    my $first_ok = first { ref $_ eq ref {} and $_->{ok} } @{ $self };

    return &default_STORE    if ! $first_ok;

    return $self->default_STORE( $index, $first_ok );
}

package main;

use Test::More;

tie my @test_results, 'Tie::StdArray::TestResults';
Test::More->builder->{Test_Results} = \@test_results;
[download]

Careful application of Data::Dumper shows an array with one hash ref and other elements that reference the same hash. This gives me confidence that the DBM::Deep method would work also, even though I haven't tried it.

Conclusion

It can hardly be denied, tie can cure and cause a multitude of sins.

Comment on More tests than you shake a memory stick at Select or Download Code

Replies are listed 'Best First'.
Re: More tests than you shake a memory stick at by jeffa (Bishop) on Nov 11, 2008 at 19:52 UTC
Rather than trying to make the code run millions of tests, I would have instead addressed organizing my tests into smaller test groups: 10 suites consisting of 1 million tests max. Was there some reason you could not break out your tests into groups? It appears that you are generating these tests dynamically somehow ... it seems to me that adding another loop or such to that would allow you to run several passes over smaller groups in separate execs and collect the results after each run. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re^2: More tests than you shake a memory stick at by kyle (Abbot) on Nov 11, 2008 at 20:20 UTC
I can group them if I have to. Each record in the database has an md5, so I can have one group for each hex digit and query out the records with an md5 that starts with just that digit. For smaller groups, more digits. Files on the disk are likewise organized by hashes, so when I send File::Find in to visit them all, I can have it focus on some subset. Thanks for the suggestion!	[reply]
Re: More tests than you shake a memory stick at by dragonchild (Archbishop) on Nov 11, 2008 at 18:18 UTC
Your DBM::Deep solution sounds like a patch is needed for Test::Builder to optionally allow for storage of the test results on disk vs. RAM. Or, alternately, that could trigger if the plan is greater than N tests (where N could be > 1_000_000 or some other ridiculously large number). I'd talk to Andy or chromatic about it. Talk to me about whatever might be needed for DBM::Deep to make this happen. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re: More tests than you shake a memory stick at by BrowserUk (Patriarch) on Nov 11, 2008 at 18:05 UTC
(ab)use I think you called it. This tends to be the result of trying to force fit your application to re-use code for purposes it was never intended. You spend more time trying to work out how to bypass, work around, or adapt to the interface or implementation impedance, than you would have spent writing a bespoke solution. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: More tests than you shake a memory stick at by gwadej (Chaplain) on Nov 11, 2008 at 21:22 UTC
I've found Test::Group to be particularly handy at taming sets of tests that I want to run as a unit. So instead of having `$tests_per_file * $file_count` tests, I use Test::Group to make a single test that combines the `$tests_per_file` individual tests. In this case, it may still not be enough. But, I did find that this approach tamed some of my tests. G. Wade	[reply] [d/l] [select]
Re: More tests than you shake a memory stick at (no_plan) by tye (Sage) on Nov 11, 2008 at 23:10 UTC
So you only have to worry about this if you have `no_plan`? That seems to be implied by what you wrote but I didn't find it stated explicitly enough. If you provide a `plan` (number of expected tests), then the (over-engineered) Test::Builder doesn't bother storing all of the test results? - tye	[reply] [d/l] [select]
Re^2: More tests than you shake a memory stick at (no_plan) by kyle (Abbot) on Nov 11, 2008 at 23:48 UTC
Test::Builder collects test results regardless of whether you provide a plan. When I encountered the problem, I had a plan. When I reported it, I used `no_plan` just for brevity. `perl -e 'use Test::More "no_plan"; pass() while 1'` [download]	[reply] [d/l] [select]