You say you don't know if your numbers will fit into 16 bit, and using 32 bit per number simply doubles storage space.

May I point you to the fact that there is an option in pack/unpack to store numbers with built in compression? They're called "BER-compressed integers" (see perlfunc:pack — use the "w" template), and they use a few bytes as possible. They're slightly less efficient in storage on the maximum number, as only 7 bits per byte are significant. But on the plus side: they don't waste any bytes they don't need, so for not so extreme values, the waste is no more than with the fixed length representation, and often less.

Experimentally I've determined the following ranges for the byte count:

rangebyte count
0 - 1271
128 - 163832
16384 - 20971513
2097152 - 2684354554
268435456 - 4294967295 (?)5

The range for 5 bytes runs at least up to 4294967295, i.e. 2**32-1, but above that, something goes wrong, at least on Indigoperl 5.6.1/Win32. Suddenly, I get a value for 2**32 that doesn't make any sense: it takes 10 bytes, for a start, while 2**32-1 only takes 5. In theory, there's no upper limit for which you could use this kind of representation, so I have no idea on what's going on.

That problem aside, here's an extra way you can save on space: sort the numbers in ascending order, and store the difference between successing numbers — the first value being the smallest number itself, the difference from zero. The more numbers you have, the closer they'll be together, and that way, you might shave of a few bytes.


In reply to Re: Compressing a set of integers by bart
in thread Compressing a set of integers by toma

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.