Dear monks,
I would like to compress SQLite FTS databases (utf-8 text data) to reduce file sizes and perhaps even improve performance. Based on documentation here and here and a lot of guesswork, I came up with some partially working code. It creates the db, and it seems to compress it as the file is smaller than the uncompressed db with the same entries. When I run a SELECT query on the compressed db, I get a correct report about the number of hits but the text content itself doesn't show up. What am I doing wrong?
use strict; use warnings; use utf8; use DBI; use IO::Compress::Gzip qw(gzip $GzipError) ; use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ; sub compressor; sub uncompressor; print "\nTesting 'compressor' and 'uncompressor' subs: " . uncompresso +r(compressor("test passed")) . "\n"; my $dbh = DBI->connect( "dbi:SQLite:dbname=test.db", "", "", { # RaiseError => 1, # we'll hand +le the errors ourselves sqlite_unicode => 1, } ) or die "Can't connect to database: $DBI::err +str";; # database connection var $dbh->sqlite_create_function( 'compressme', 1, sub { compressor } ); $dbh->sqlite_create_function( 'uncompressme', 1, sub { uncompressor } +); $dbh->do("DROP TABLE IF EXISTS EnHu") or print "drop table failed"; # $dbh->do("CREATE VIRTUAL TABLE EnHu USING fts4 (English TEXT, Hungar +ian TEXT)") or print "failed to create fts4 table"; # uncompre +ssed $dbh->do("CREATE VIRTUAL TABLE EnHu USING fts4 (English TEXT, Hungaria +n TEXT, compress=compressme, uncompress=uncompressme)") or print "fai +led to create fts4 table"; # compressed while (<DATA>) { chomp; /^([^\t]*)\t([^\t]*)/; my $lang1 = $1; my $lang2 = $2; # print "\n\ninserting:\n$lang1\n$lang2"; my $insert = $dbh->prepare('INSERT INTO EnHu (English, Hungarian) +VALUES(?, ?)'); # my $insert = $dbh->prepare('INSERT INTO EnHu (English, Hungarian +) VALUES(compressme(?), compressme(?))'); #no need to call the fun +ctions, SQLite seems to do it on its own $insert->execute("$lang1", "$lang2"); } my $countentries = $dbh->prepare( "SELECT Count(*) FROM EnHu" ); # OK $countentries->execute(); my $size = $countentries->fetch(); my $searchterm = "some"; print "\n\nImport done, the db contains @$size records.\nEnter search +term to run query\n"; chomp($searchterm = <STDIN>); my $q_1 = $dbh->prepare( "SELECT * FROM EnHu WHERE English MATCH ?" ) +or print "prepare failed"; # works on uncompressed db, kinda works + on compressed db # my $q_1 = $dbh->prepare( "SELECT uncompressme(English) FROM EnHu WHE +RE English MATCH ?" ) or print "prepare failed"; # doesn't work # my $q_1 = $dbh->prepare( "SELECT uncompressme(English),uncompressme( +Hungarian) FROM EnHu WHERE English MATCH ?" ) or print "prepare faile +d"; # my $q_1 = $dbh->prepare( "SELECT uncompressme(English) FROM EnHu WHE +RE English MATCH ?" ) or print "prepare failed"; # my $q_1 = $dbh->prepare( "SELECT * FROM EnHu WHERE uncompressme(Engl +ish) MATCH ?" ); # doesn't work # my $q_1 = $dbh->prepare( "SELECT * FROM EnHu WHERE English MATCH com +pressme(?)" ); # nope $q_1->execute($searchterm) or print "execute failed"; # my $q_2 = $dbh->prepare( "SELECT * FROM EnHu LIMIT 4" ); # $q_2->execute() or print "execute failed"; # my $all = $q_2->fetchall_arrayref() or print "fetch failed"; my $all = $q_1->fetchall_arrayref() or print "fetch failed"; my $i = @$all; foreach my $row (@$all) { print "\n--------------------\n"; foreach my $item (@$row) { print $item; print "\t"; }; } print "\n\n"; print "\n$i matches found."; <STDIN>; sub compressor { my $in = shift; my $out; gzip \$in => \$out; return ($out); } sub uncompressor { my $in = shift; my $out; gunzip \$in => \$out; return ($out); } __DATA__ this is some text némi szöveg some more text here na még egy kicsi just for testing you know nalássuk fourth row of text még némi szöveg and a little more text for good measure dobjuk hozzá még ezt is

In reply to How do I read from a compressed SQLite FTS4 database with DBD::SQLite? by elef

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.