elef has asked for the wisdom of the Perl Monks concerning the following question:

I'm building a text search app using DBD::SQLite and FTS4. I'm completely new to databases, but I got the basic things working well. Now there are a couple of things I need advice on.

(1) I would like to compress the databases. A lot of people will be using dbs that contain 10 million records or more (upwards of 10GB or even 50GB uncompressed). This is all UTF-8 text data, some of it very repetitive so it compresses well. How do I go about this? The DBD::SQLite cpan page mentions compression briefly, and there is a bit on this at but I can find no actual working sample code that I could use. The page seems to assume that I will write the compress and uncompress functions myself, which is way above my pay grade. Is there a ready-made solution somewhere that I am missing?

(2) Currently, my app can handle one database file at a time. It does a search and returns the results ordered by length. I would like to add support for multiple database files, allowing the user to set a ranking and displaying hits in that order (first, all hits from 'tier 1' dbs ordered by length, then all hits from 'tier 2' dbs ordered by length etc.). Is this feasible? Would I need to ATTACH each of the dbs to the same connection? How can I run the same query on all the dbs? Create a foreach my $db (@databases) loop and iterate through them, executing the query in each one in turn?

(3) I would like to add match highlighting but the DBD::SQLite page says "The current FTS implementation in SQLite is far from complete with respect to utf8 handling : in particular, variable-length characters are not treated correctly by the builtin functions offsets() and snippet()." Has that been fixed since this was written or should I forget about offsets() and try to write my own code that tries to analyze each string returned from the database and find the search terms in it so I can highlight them?

I can post the my current (working) code if anyone's interested.

Replies are listed 'Best First'.
Re: DBD::SQLite questions
by ww (Archbishop) on Nov 27, 2013 at 17:31 UTC
    Too many questions with too many unrelated answers for a single thread.

      One thing a SQLite is a flat files based DB so you can compress the files using either gzip or other compression you like.

        Well, SQLite has internal hooks for compress() and uncompress() so I would like to use those. I just don't know how.
        From the documentation, it seems I would need to call
        $dbh->sqlite_create_function( 'compress', 1, sub { # magic incantation +s for compression } ); $dbh->sqlite_create_function( 'uncompress', 1, sub { # magic incantati +ons for uncompression } );

        Then when I call INSERT or SELECT I need to compress() and uncompress() the text strings.
        Maybe I could use IO::Compress::Zip for the magic incantations...?
        That's what I can gather, but I can find no specifics on implementation.