comment on

Summary

I wanted to run a Test::More script that would ultimately execute over 10_000_000 tests. It died, however, after about 8_000_000 because it ran out of memory. After some investigation, I found that Test::Builder retains a record for every test run, and this is likely why my test died.

In this meditation I look at a few solutions to this problem.

Background (What I was really trying to do.)

At $work, we have a multi-terabyte NFS mounted storage pool with millions of files, each with a record in the application's database. I wrote a few audit tools to confirm (1) that each file in storage has a record in the database, (2) that each record in the database has a file in storage, and (3) that the md5 has for the file in storage matches the one in the database. It also does some other sanity checking.

I thought it would be a good idea to (ab)use standard testing tools to write this. It could output TAP and run under Test::Harness. It would be easier to automate a "quick" day-long sanity check.

From the perspective of the testing framework, there are multiple tests per file. Each test verifies the correctness of some property of the files and their relationship to the database.

Planning a lot of tests (The opening of hostilities.)

This is actually pretty easy. I open the database and ask it how many files there are supposed to be. Then I use that for my plan.

use Test::More;
use File::Find;
use DBI;

my $dbh = DBI->connect( ... );
my ($file_count) = $dbh->selectrow_array( 'SELECT count(*) FROM t' );
plan 'tests' => $tests_per_file * $file_count;

find({ wanted => \&verify, follow_fast => 1 }, $storage_dir );
diag( "It's normal to run more tests than planned because files have b
+een created since the records were counted" );
[download]

Method 1: Change Test::Builder (Plead for mercy.)

I filed a change request, but my expectations are pretty low. Having looked into the code a little, I think this change is easier said than done.

Method 2: Use the disk. (tie to DBM::Deep.)

I didn't actually try this, but I'm pretty sure it would work.

# before testing.
my $results_db = 'test_results.db';
if ( ! unlink $results_db && -e $results_db ) {
    die "Can't unlink existing results db '$results_db': $!";
}
my $db = tie my @test_results, 'DBM::Deep', 'test_results.db';
Test::More->builder->{Test_Results} = \@test_results;
[download]

This should cause the test results to go to the test_results.db file on disk instead of hogging memory. When testing is over, you'll want to unlink that file.

The elements of Test::More->builder->{Test_Results} are hash references, so my first choice of Tie::File wouldn't work.

Method 3: Delete test results (Lie to the framework.)

Out of millions of tests, I expect maybe a few hundred fails. All the successes are more or less the same to me. So maybe I can make an array where every success is the same success. Let there be only one success and let every subsequent success be merely a reference to that one.

package Tie::StdArray::TestResults;
use Tie::Array;
@Tie::StdArray::TestResults::ISA = ( 'Tie::StdArray' );

use List::Util qw( first );

sub default_STORE { $_[0]->[$_[1]] = $_[2] }

sub STORE {
    my ( $self, $index, $val ) = @_;

    return &default_STORE    if ref $val ne ref {};
    return &default_STORE    if ! $val->{ok};

    my $first_ok = first { ref $_ eq ref {} and $_->{ok} } @{ $self };

    return &default_STORE    if ! $first_ok;

    return $self->default_STORE( $index, $first_ok );
}

package main;

use Test::More;

tie my @test_results, 'Tie::StdArray::TestResults';
Test::More->builder->{Test_Results} = \@test_results;
[download]

Careful application of Data::Dumper shows an array with one hash ref and other elements that reference the same hash. This gives me confidence that the DBM::Deep method would work also, even though I haven't tried it.

Conclusion

It can hardly be denied, tie can cure and cause a multitude of sins.

In reply to More tests than you shake a memory stick at by kyle

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.