The thing is that I compress each article separatlly, and I'm pretty sure I'd get much better result if I could compress them with a shared dictionnary or something.

This is a classical time vs. space problem. If you compress the articles together, every time you want to read one you have to (at least) uncompress all the others that came before it.

Instead of Zlib compression, you can try BZip2 or LZMA. They usually compress much better, see for example the modules IO::Compress::Bzip2 or IO::Compress::Lzma respectively.

You could, in theory, use Archive::Tar to compress multiple articles into a single file when the optional IO::Zlib is installed.

Does any of this really gives you a better compression ratio at all and if it does, how much will it affect your loading time? Well, you really have to build a few simple test cases with a few hundred randomly selected articles, i guess. I think using Bzip2 or LZMA could actually improve both, since CPU's are generally very fast at decompressing and you'll use less bandwidth from the harddisk. But generating the data will be very slow.

As for Archive::Tar, my guess is it will slow things down while not saving any relevant space compared to your existing solution of using GZip.

But, as i said, you should really test it for yourself with a relevant (randomly selected) subset of the data you will use in the full project. Only this will give you the best view of space/time tradeoffs relevant to your project.

BREW /very/strong/coffee HTTP/1.1
Host: goodmorning.example.com

418 I'm a teapot

In reply to Re: How to efficiently compress a Berkeley database? by cavac
in thread How to efficiently compress a Berkeley database? by grondilu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.