With these volumes of data, 45,000 record/45 MB and 7 from 13 fields indexed using a Storable index, I achieved sub-1-second times to

  1. load both the data and index;
  2. Perform 1000 7-key searchs
  3. Retrieve 3 records for each of the 1000 matches.

Benchmark output:

c:\test>570563.pl 1 trial of load index (593.750ms total) 1000 trials of Perform 7-key search and retrieve 3 records (113.182ms +total), 113us/trial c:\test>570563.pl 1 trial of load index (625ms total) 1 trial of load datafile (94.282ms total) 1000 trials of Perform 7-key search and retrieve 3 records (117.603ms +total), 117us/trial c:\test>570563.pl Check memory 1 trial of load index (640.625ms total) 1 trial of load datafile (112.418ms total) 1000 trials of Perform 7-key search and retrieve 3 records (116.873ms +total), 116us/trial

Betcha can't achieve that performance with an RDBMS :)

POC code:

Flat file indexer (runs once) :

#! perl -slw use strict; use Storable; open DB, '<', '570563.dat' or die $!; my @index; my $offset = 0; while( <DB> ) { chomp; my @fields = split '\|'; push @{ $index[ $_ ]{ $fields[ $_ ] } }, $offset for 0,2,4,6,8,10,12; ## 7 x 100 byte fields indexed. $offset = tell DB; } close DB; store \@index, '570563.idx'; ## Index stored to file

Search & retrieval benchmark (999 sets of 7 search keys in __DATA section omitted for posting):

#! perl -slw use strict; use Benchmark::Timer; use Storable; my $T = new Benchmark::Timer; $T->start( 'load index' ); my $index = retrieve '570563.idx'; $T->stop( 'load index' ); $T->start( 'load datafile' ); my $ramfile; open my $dataFH, '<', '570563.dat' or die $!; sysread $dataFH, $ramfile, -s( '570563.dat' ); close $dataFH; open my $ramFH, '<', \$ramfile; $T->stop( 'load datafile' ); ## printf 'Check memory'; <STDIN>; while( <DATA> ) { $T->start( 'Perform 7-key search and retrieve 3 records' ); chomp; my @fields = split '\|'; my %matches; $matches{ $_ }++ for map{ @{ $index->[ $_ ]{ $fields[ $_ ] } } } 0, 2, 4, 6, 8 +, 10, 12; for ( grep{ $matches{ $_ }== 7 } keys %matches ) { seek( $ramFH, $_, 0 ); # print "$_: ", ## Disable printing for benc +hmark substr <$ramFH>, 0 , 100; } $T->stop( 'Perform 7-key search and retrieve 3 records' ); } $T->report; __DATA__ rlJUCO1XGHEA1lULImByZiPS7rIkJldqyQrc9gTvppxtOe3Ae6jIHbTnFKLCHZVco8T2lK +wz1HSnZ1bunO8gzwq9ftDtVWvjpuJU||Cg73CVHXVUiORGbVcXVVi0OueBikEIP7KvQKk +egTX6Co6wDqow5hRvYHEaWCrtkn1LZ6xR6Lnz1hmKSqOEy0WYBKOJ7tnRgJ7gYV||BgvL +wMdDqdQSyexHXtJWIhs5xPoePlH8I9I8AbZpOqpmFOD6N0PAzcgTxrr6BHBvSdFGD4t8H +m2o1mrhASjKXiaYz2HgrQBUBn3A||3gNrTZcXj9hyoVuBWfLxmwKqkaEqmErDJ7zDaJvI +QHrSoGgZm3Gzz1oTbg2lGB8m459jpzdEUIFanQHMAAfXuMPEjA9EdwgWAC7q||TYKbYAv +6m5vTvj7Hs7yIG86HeylN1nTkZgbc4P4MIULrYXHilMZQIJ0QtuNlt8IP45kCPhY52Ecr +f0LxdEYeCWw5vhzSoXOtCKAL||6GFr5Kwz1SvVaCChBoBQg6cJBGtvBcEIi8wbymtL8Jk +ISoz9lVA5S1hZepCJ3s3j5H9ui2LNfVaiKlSG6cpzeCajS710UxkNB3ym||jq27ex5Zwt +uVXyGQ2zvU4xchkvI8CdwudMX64tm196yMExgY7MmmYo8tzAo0hc5dcyrQDR5AfFdEQr7 +EnNPecuzlETBFD3Kn7YcP [999 similar records ommited for posting]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Large Constant Database in Text File by BrowserUk
in thread Large Constant Database in Text File by stephentyler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.