MimisIVI, if you mean vec, Tachyon-II pointed you at vec 5 days ago in Re^6: Compress positive integers. It's right there in the last line. Maybe you missed it?
Anyhow, here is Elias Gamma coded in pure Perl:
sub log2{ int( ( log( $_[0] ) + 1e-15 ) / log( 2 ) ) } sub PP_EliasPack { my $packed = ''; my $out = 0; for my $num ( @_ ) { my $len = log2( $num ); $out += $len; vec( $packed, $out++, 1 ) = 1; vec( $packed, $out++, 1 ) = ( $num & ( 1 << $_ ) ) ? 1 : 0 for 0 .. $len - 1; } return $packed; } sub PP_EliasUnpack { my $packed = shift; my $bits = length( $packed ) * 8; my $in = 0; my @unpacked; while( $in < $bits ) { my( $len, $num ) = ( 0 ) x 2; $len++ while $in < $bits && vec( $packed, $in++, 1 ) == 0; last if $in == $bits; vec( $packed, $in++, 1 ) and $num |= ( 1 << $_ ) for 0 .. $len + - 1; $num |= ( 1 << $len ); push @unpacked, $num; } return @unpacked; }
And for reference, here is the results of my benchmark with that included:
C:\test>678848 Run with 15000 unique words in 1000 documents (Ave: 554 words/doc) ASCII uses 4755336 bytes W-BER uses 3196819 bytes Binary uses 3279128 bytes Elias uses 3980063 bytes PP_Elias uses 3980063 bytes 1 trial of Packing ascii (10.203s total) 1 trial of Unpacking ascii (3.159s total) 1 trial of Packing W-BER (18.159s total) 1 trial of Unpacking W-BER (1.516s total) 1 trial of Packing binary (9.910s total) 1 trial of Unpacking binary (1.455s total) 1 trial of Packing Elias (13.613s total) 1 trial of Unpacking Elias (2.739s total) 1 trial of Packing PP_Elias (31.128s total) 1 trial of Unpacking PP_Elias (18.094s total)
Notice that PP_Elias achieves identical compression to Elias in C (as you'd expect :), but that it's twice as slow at packing, and nearly 10 times slower when unpacking.
If you want to continue with using Elias, and can't build the C version yourself, I could let you have it pre-built (for win32), but I have to wonder why you would when W-BER is faster and achieves better compression?
Also, I wonder if you saw Re^8: Byte allign compression in Perl.. where I demonstrated that you can have the DB do the selection for you using the schema I suggested way back when, in 0.312 of a second? For the record, I found a small optimisation in the schema that reduce that by a factor of 10, to 31 milliseconds.
So the DB does the selection, sends you just the data you need to do your proximity calculations, and does it all faster than you could pack a single integer. Interested?
In reply to Re^2: Do you really want to use an array there?
by BrowserUk
in thread Do you really want to use an array there?
by deprecated
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |