in reply to More details on compressing a set of integers
in thread Compressing a set of integers
Its worth pointing out that attempting to compress a string containing a list of comma delimited ascii numbers using Base64, will make the string grow in size, not shrink.
Eg. A string containing 1000 5 digit numbers separated by commas, is 6000 bytes. Encode that base64 and you get a string nearly 8K in length.
Pack the same string using 16-bits and you get 2000 bytes.
If you really need to make it smaller still, then you might consider using Compress::Zlib::memGzip which compresses the same ascii list into 1800 bytes.
There are caveats associated with this 10% reduction in size. A) your compression will vary with the data set. B) Compress::Zlib::memGunzip is 10 to 15 times slower than unpack.
If your main concern is that using pack & unpack with format 'S*', will limit you to 16-bit numbers, then move to 32-bit using 'L*'. This will pack the same 1000 numbers into 4000 bytes which makes it look not such a good option until you think about what happens when you start using 6-digit numbers.
The string containing 100,000 .. 101,000, comma delimited ascii is 7000 bytes. Base64:9467, zipped:1817, packed 32-bit 4000. Again, Zlib is looking good, but now the figures for a random mixed of 1 to 6-digit numbers
ascii:6458, base64:8726, zipped:3169, pack 32-bit:4000.
Suddenly the trade for decompression speed looks less valuable, and if you move to a random mix of 8-digit numbers, Zlib starts averaging 4500 bytes for 1000 numbers, pack remains at 4000 bytes.
Finally, with Zlib, there is a heavy penalty for small lists. Lists of 1 x 5-digits average around 25 bytes; 4 x 5-digits numbers around 45 bytes and 4 x 8-digit numbers around 75 bytes. The corresponding pack'd lists come out at 2/8 for 'S*' and 4/16/16 for 'L*'.
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: More details on compressing a set of integers
by toma (Vicar) on Jan 26, 2003 at 07:08 UTC |