I was surprised.

I have nothing to say for loading time. I never imagined execute and fetch become so fast. It is slower than hash lookup, but it becomes really fast.

And smaller memory usage. These are output of "ps -axorss,vsz -p $$" before loading, after loaded. Size is in KB.

>perl 07-1.pl  #load to hash
before:
  RSS   VSZ
 2736  5036
after:
  RSS   VSZ
61332 63404

>perl 07-2.pl  #sqlite in-memory
before:
  RSS   VSZ
 4612  7308
after:
  RSS   VSZ
 7796  9992
File size of database is 37MB, if I dump this table to text, it becomes 8.9MB. I wonder how they load it in-memory?

Below is some test results of lookup. It loads data and lookup ARG times from 283600 records/hash.
>perl 07.pl 100
            (warning: too few iterations for a reliable count)
               s/iter 03_sqlite_disk      01_substr  02_sqlite_mem
03_sqlite_disk   24.0             --           -89%           -97%
01_substr        2.72           780%             --           -77%
02_sqlite_mem   0.627          3720%           334%             --

I cut SQLite on disk here after.
>perl 07.pl 1000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.71            --          -75%
02_sqlite_mem  0.687          295%            --

>perl 07.pl 10000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.74            --          -61%
02_sqlite_mem   1.07          157%            --

>perl 07.pl 50000
              s/iter     01_substr 02_sqlite_mem
01_substr       2.80            --           -3%
02_sqlite_mem   2.72            3%            --

>perl 07.pl 100000
              s/iter 02_sqlite_mem     01_substr
02_sqlite_mem   4.59            --          -36%
01_substr       2.92           57%            --

And test code.
#!/usr/bin/perl #SQLite mem test use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; use DBI; my $href; my @t_synsets; my $DB_DISK='./wnjpn.db'; my $DB_TO_MEM='./wn-synlink.db'; #load @ARGV[0] of synset for lookup test load_test_synsets(); my %tests = ( '01_substr' => \&test1, '02_sqlite_mem' => \&test2, # '03_sqlite_disk' => \&test3, ); cmpthese( -20, #for 20 cpu secs \%tests ); sub test1{ #substr $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; #iterate my $cnt=0; for( @t_synsets ){ if (exists $href->{$_} ){ $cnt += @{ $href->{$_} }; } } #print "test1:arrays cnt=$cnt\n"; } sub test2{ #sqlite mem my $mem_dbh = DBI->connect('dbi:SQLite:dbname=:memory:'); $mem_dbh->sqlite_backup_from_file($DB_TO_MEM); my $sth=$mem_dbh->prepare("select synset2 from synlink where synse +t1=?"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test2:arrays cnt=$cnt\n"; $mem_dbh->disconnect; } sub test3{ #sqlite disk my $dbh = DBI->connect("dbi:SQLite:dbname=$DB_DISK") or die DBI->e +rrstr; my $sth=$dbh->prepare("select synset2 from synlink where synset1=? +"); #iterate my $cnt=0; for( @t_synsets ){ $sth->execute($_); my $tmp=$sth->fetchall_arrayref; $cnt+=@{$tmp}; } $sth->finish; #print "test3:arrays cnt=$cnt\n"; $dbh->disconnect; } #$ARGV[0] synsets to lookup sub load_test_synsets{ my $dbh=DBI->connect("dbi:SQLite:./wnjpn.db","","", { AutoCommit=>0, #sqlite_use_immediate_transaction=>1, #sqlite_unicode=>1, RaiseError=>1, }) or die DBI->errstr; if ( defined($ARGV[0]) && $ARGV[0] =~ /^\d+$/){ @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset limi +t $ARGV[0]")}; } else { @t_synsets = map{$_->[0]} @{$dbh->selectall_arrayref("select synset FROM synset")}; } $dbh->disconnect; }
I would like to lookup around 2000 to 15000 records, so SQLite in-memory suit me fine. Thanks for information.


In reply to Re^2: Loading 283600 records (WordNet) by remiah
in thread Loading 283600 records (WordNet) by remiah

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.