That is hot and I am, as usual, amazed at your comp-sci chops. I cobbled together a DB_file example because I was curious if it would be slower.
It's not a 1:1 comparison because it's a straight key lookup, not a real search like yours; but that's what I took the OP to want/need. I only built my test file up to .75GB-ish because of lack of headroom on my drive (I only have 2GB left and I had to build two of these files so...).
Fake data generator-
my @what = qw( bothChaff bothDegen oneChaff oneDegen ); open my $fh, ">", "data.data" or die; for ( 10000000 .. 99999999 ) { next if rand(10) > 6.7; print $fh sprintf("11010%d\t11010%d\t%s\n", $_, $_+3333, $what[rand@what]); last if -s $fh > 1_000 * 1_000 * 1_000; }
Search + search DB builder-
use DB_File; use strict; use Time::HiRes qw[ time ]; my $start = time; my %A; tie %A, "DB_File", "db.file", O_CREAT|O_RDWR, 0666, $DB_HASH or die; if ( $ARGV[0] eq "build" ) { open my $fh, "<", "data.data" or die; while (<$fh>) { chomp; my ( $key, $val ) = split /\s+/, $_, 2; $A{$key} = $val; } } else { print $A{$ARGV[0]} || "nothing found", $/; printf "Took %.2f seconds\n", time() - $start; } -- moo@cow[331]~/bin>pm-768941 1101078637800 1101078641133 oneChaff Took 0.02 seconds
I got 0.02 seconds on almost every run and this is on a nine year-old computer and, I think, a five year old copy of the related Berekely binaries. I turned the printf to .3f and it was mostly ranging from 0.016 to 0.019 seconds. A caveat being that it took a loooooooong time to build the DB file. I wasn't clocking it but it was edging up on an hour.
<rejoinder type="in good fun!"> The trouble with engineers who are smarter than most everyone else is that given Y they solve for X where X is a more interesting problem than Y. </rejoinder>
In reply to Re^2: Rapid text searches ( O(1) space and time)
by Your Mother
in thread Rapid text searches
by joomanji
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |