Re^3: How do I read from a compressed SQLite FTS4 database with DBD::SQLite?

Replies are listed 'Best First'.
Re^4: How do I read from a compressed SQLite FTS4 database with DBD::SQLite? by elef (Friar) on Nov 29, 2013 at 16:34 UTC
Thanks. I did have `sqlite_unicode => 1`, and the db worked with non-ASCII text if I didn't try to compress it. This seems to fix the problem: `sub compressor { my $in = shift; $in = encode ('utf8', $in); my $out; gzip \$in => \$out; return ($out); } sub uncompressor { my $in = shift; my $out; gunzip \$in => \$out; return (decode ('utf8', $out)); }` [download] I tested it with some real-life sample data and the compression isn't doing too well: the source data is a 9.4MB text file that compresses down to a 2.4MB zip file. When I imort it without compression, I get a 19.7MB db file. With compression, the db file is 17.0MB. That's a little smaller than the original but not enough to make it worth it. I was hoping for something in the 10MB range (~50% compression). I imagine it could be because each string is compressed separately so repeated strings or parts of strings can't be exploited during compression. Is this a lost battle? If not, I would be grateful for suggestions on a better algorithm.	[reply] [d/l] [select]
Re^5: How do I read from a compressed SQLite FTS4 database with DBD::SQLite? by taint (Chaplain) on Nov 30, 2013 at 06:39 UTC
Greetings, elef Have you tried any of the other different forms of compression IO::Compress offers? My personal experiences when creating archives, seems to indicate the xz algorithm provides better results, more often than not. I notice IO::Compress also offers IO::Compress::Xz. Of course all the algorithm's have different results given the type of input data. But thought it worth mentioning. Best Wishes. --Chris #!/usr/bin/perl -Tw use Perl::Always or die; my $perl_version = (5.12.5); print $perl_version;	[reply]
Re^6: How do I read from a compressed SQLite FTS4 database with DBD::SQLite? by elef (Friar) on Nov 30, 2013 at 09:57 UTC
Thanks for the suggestion. I use ActivePerl on Widows and IO::Compress::Xz is not in PPM. I tried to install it from the cpan shell but running the script fails with `Can't locate auto/Compress/Raw/Lzma/autosplit.ix in @INC`. Maybe I will try to get the module installed in a linux VM and see how it performs. I do have IO::Compress::Zip so I tried that but the compressed db was much larger than the an uncompressed one with the same data... It's starting to look like I'm in a dead end. Again, I think the compression is failing this badly because each column in each record is compressed separately by the FTS engine and I have text here in small chunks (sentences). But I don't think it would be feasible to structure the db differently because of the way the data is used (you search for a term or phrase and the program shows you each sentence that it occurs in, along with its translation in a different language).	[reply] [d/l]
Re^7: How do I read from a compressed SQLite FTS4 database with DBD::SQLite? by taint (Chaplain) on Nov 30, 2013 at 19:54 UTC