comment on

That is hot and I am, as usual, amazed at your comp-sci chops. I cobbled together a DB_file example because I was curious if it would be slower.

It's not a 1:1 comparison because it's a straight key lookup, not a real search like yours; but that's what I took the OP to want/need. I only built my test file up to .75GB-ish because of lack of headroom on my drive (I only have 2GB left and I had to build two of these files so...).

Fake data generator-

my @what = qw( bothChaff bothDegen oneChaff oneDegen );

open my $fh, ">", "data.data" or die;

for ( 10000000 .. 99999999 )
{
    next if rand(10) > 6.7;
    print $fh sprintf("11010%d\t11010%d\t%s\n",
                      $_, $_+3333, $what[rand@what]);
    last if -s $fh > 1_000 * 1_000 * 1_000;
}
[download]

Search + search DB builder-

use DB_File;
use strict;
use Time::HiRes qw[ time ];

my $start = time;

my %A;
tie %A, "DB_File", "db.file", O_CREAT|O_RDWR, 0666, $DB_HASH or die;
if ( $ARGV[0] eq "build" )
{
    open my $fh, "<", "data.data" or die;

    while (<$fh>)
    {
        chomp;
        my ( $key, $val ) = split /\s+/, $_, 2;
        $A{$key} = $val;
    }
}
else
{
    print $A{$ARGV[0]} || "nothing found", $/;
    printf "Took %.2f seconds\n", time() - $start;
}
--
moo@cow[331]~/bin>pm-768941 1101078637800
1101078641133   oneChaff
Took 0.02 seconds
[download]

I got 0.02 seconds on almost every run and this is on a nine year-old computer and, I think, a five year old copy of the related Berekely binaries. I turned the printf to .3f and it was mostly ranging from 0.016 to 0.019 seconds. A caveat being that it took a loooooooong time to build the DB file. I wasn't clocking it but it was edging up on an hour.

<rejoinder type="in good fun!"> The trouble with engineers who are smarter than most everyone else is that given Y they solve for X where X is a more interesting problem than Y. </rejoinder>

In reply to Re^2: Rapid text searches ( O(1) space and time) by Your Mother
in thread Rapid text searches by joomanji

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.