in reply to Fast - Compact That String
How does 16 second/million (both ways) including sub call overhead compare?
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my @c = (' ', '0'..'9', 'A'..'Z' );
sub fromB37 {
my $n = shift;
join '', map {
my $c = $n %37;
$n = int( $n / 37 );
$c[ $c ];
} 1 .. 6;
}
my %c = map{ $c[ $_ ] => $_ } 0 .. 36;
sub toB37 {
my $n = 0;
$n = $n * 37 + $c{$_} for reverse unpack '(a)*', $_[0];
return $n;
}
my $start = time;
for ( 1 .. 1e6 ) {
my $n = int( rand 37**6 );
$n == toB37( fromB37( $n ) ) or die $n;
}
printf "Took %.3f second\n", time() - $start;
__END__
C:\test>junk45
Took 16.404 second
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
Re^2: Fast - Compact That String
by BrowserUk (Patriarch) on Feb 09, 2012 at 23:30 UTC
|
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my @c1 = (' ', '0'..'9', 'A'..'Z' );
sub fromB37 {
my $n = shift;
my $s = ' ';
substr( $s, $_, 1, $c1[ $n%37 ] ), $n /= 37 for 0 .. 5;
$s;
}
my @c2;
$c2[ ord( $c1[ $_ ] ) ] = $_ for 0 .. 36;
sub toB37 {
my $n = 0;
$n = $n * 37 + $c2[$_] for reverse unpack 'C*', $_[0];
$n;
}
my $start = time;
for ( 1 .. 1e6 ) {
my $s = fromB37( rand 37**6 );
}
printf "fromB37 took %.3f seconds/million\n", time() - $start;
$start = time;
for ( 'AAAAA' .. 'CEXHN' ) {
my $n = toB37( $_ );
}
printf "toB37 took %.3f second/million\n", time() - $start;
my @data = map int( rand 37**6 ), 1 .. 1e6;
$start = time;
$_ == toB37( fromB37( $_ ) ) or die $_ for @data;
printf "Both ways took %.3f second/million\n", time() - $start;
__END__
C:\test>junk45
fromB37 took 5.309 seconds/million
toB37 took 3.234 second/million
Both ways took 8.953 second/million
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
|
my @c3 = map { my $c = $_; (map { "$_$c" } @c1) } @c1;
sub myFromB37 {
my $n = shift;
my $s = ' ';
substr( $s, 0, 2, $c3[$n % 1369] );
$n /= 1369;
substr( $s, 2, 2, $c3[$n % 1369] );
$n /= 1369;
substr( $s, 4, 2, $c3[$n] );
$s;
}
my @c4;
$c4[unpack 'S', $c3[$_]] = $_ foreach 0 .. 1368;
sub myToB37 {
1874161 * $c4[unpack 'S', substr($_[0], 4, 2)] +
1369 * $c4[unpack 'S', substr($_[0], 2, 2)] +
$c4[unpack 'S', substr($_[0], 0, 2)];
}
my @num_data = map int( rand 37**6 ), 1 .. 1e6;
my @asc_data = map " $_", 'AAAAA' .. 'CEXHN';
The last lines pre-generate test data because my version depends on strings being no less then 6 characters; the rest of the benchmarking is analogous to yours. You could save one "reverse" in your version though by putting the characters in reverse order, which they should be anyway to represent a normal left-to-right base37 number. | [reply] [d/l] |
|
Nice!++ On my machine, both your routines are almost exactly twice as fast as mine.
I get the table sizes as totaling around 400k: @c3:87792 @c4:295160, but that is a perfect trade for speed.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] |
|
mbethke,
Very nice. I had to slightly modify it in order to get the sort order I wanted:
my @c3 = map { my $c = $_; (map { "$c$_" } @c1) } @c1;
sub myFromB37 {
my $n = shift;
my $s = ' ';
substr( $s, 4, 2, $c3[$n % 1369] );
$n /= 1369;
substr( $s, 2, 2, $c3[$n % 1369] );
$n /= 1369;
substr( $s, 0, 2, $c3[$n] );
$s;
}
my @c4;
$c4[unpack 'S', $c3[$_]] = $_ foreach 0 .. 1368;
sub myToB37 {
1874161 * $c4[unpack 'S', substr($_[0], 0, 2)] +
1369 * $c4[unpack 'S', substr($_[0], 2, 2)] +
$c4[unpack 'S', substr($_[0], 4, 2)];
}
If you have time, I would really appreciate an in-depth explanation.
| [reply] [d/l] |
|
|
| [reply] |
|
|
|
|
## Map the numbers, 0 .. 36 to the symbols we use
## to represent the number in base37
my @c1 = (' ', '0'..'9', 'A'..'Z' );
sub fromB37 {
my $n = shift; ## Get the number to convert
## Allocate space for the Base37 representation
## Initialise it to the representation of 0 (six spaces)
my $s = ' ';
## For each position in the string
for( 0 .. 5 ) {
## extract the next base37 digit value
## look up its representaion character
## and assign it to the 'right place' i the string.
substr( $s, $_, 1 ) = $c1[ $n%37 ] );
## dividing by 37 effectively right-shifts
## the last digit's value out of the number
$n /= 37;
}
$s;
}
my @c2;
## Map the ordinal values of the symbols
## to their numeric values (0 .. 37)
## The sparse array is faster than a hash
$c2[ ord( $c1[ $_ ] ) ] = $_ for 0 .. 36;
sub toB37 {
my $n = 0; ## initialise our return value to 0
## split the base37 representation
## into a list of the ordinal values of the symbols
## and reverse their order to match that produced by fromB37()
for( reverse unpack 'C*', $_[0] ) {
## multiple the running total by 37
## (effectively left-shifting the accumulator
## to accommodate the next digit.)
## and add value of the next base37 digit
## by looking it up in the mapping array
$n = $n * 37 + $c2[ $_ ];
}
## return the accumulated value.
$n;
}
As mbethke points out, these treat the base37 number in 'little-endian' fashion. This because you emphasised compression and speed, with no mention of needing to manipulate the compressed values numerically (sorting).
To get the sortable, big-endian representation, you could use: my @c1 = (' ', '0'..'9', 'A'..'Z' );
sub fromB37 {
my $n = shift;
my $s = ' ';
substr( $s, $_, 1, $c1[ $n%37 ] ), $n /= 37 for 5, 4, 3, 2, 1, 0;
$s;
}
my @c2;
$c2[ ord( $c1[ $_ ] ) ] = $_ for 0 .. 36;
sub toB37 {
my $n = 0;
$n = $n * 37 + $c2[$_] for unpack 'C*', $_[0];
$n;
}
Which actually works out a bit quicker still, but not as fast as mbethke's unrolled version.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] [select] |
|
|