Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Fast - Compact That String

by tobyink (Canon)
on Feb 09, 2012 at 23:06 UTC ( [id://952877]=note: print w/replies, xml ) Need Help??


in reply to Fast - Compact That String

Here's my attempt. There's two versions of the functions here. The first uses multiplication and division to implement something along the lines of what you say, where each possible six character string is mapped to a the integers 0 .. 2_565_726_408, encoded as 4 bytes.

The second deals with the first three bytes and the second three bytes separately, mapping each to an integer in the range 0 .. 50652, and encoding each to strings of length 2 bytes, 4 bytes altogether.

Although I haven't benchmarked them, my gut tells me that the second is faster. Whatsmore, the second can run within a "use integer" block, which allows Perl to use fast integer maths. The first will not run within "use integer" because it sometimes overflows.

#!/usr/bin/perl use strict; use warnings; { use bytes; my %lookup; my @reverse; my $scale; BEGIN { my $x = 0; my @chars = (' ', 0..9, 'A'..'Z'); keys %lookup = 65_536; # preallocate hash buckets for my $i (@chars) { for my $j (@chars) { for my $k (@chars) { $lookup{ $i.$j.$k } = $x; $reverse[$x++] = $i.$j.$k; } } } $scale = $x; } # Functions using multiplication... sub alphanum_to_bytes_M { my $head = $lookup{ substr($_[0], 0, 3) }; my $tail = $lookup{ substr($_[0], 3, 3) }; my $n = ($head * $scale) + $tail; pack(N => $n) } sub bytes_to_alphanum_M { my $n = unpack(N => $_[0]); my $head = int($n / $scale); my $tail = $n - ($head * $scale); $reverse[$head] . $reverse[$tail] } # Functions using bitshifting... { use integer; sub alphanum_to_bytes_B { my $head = $lookup{ substr($_[0], 0, 3) }; my $tail = $lookup{ substr($_[0], 3, 3) }; pack(nn => $head, $tail) } sub bytes_to_alphanum_B { join q{}, @reverse[ unpack(nn => $_[0]) ] } } # Function to pretty-print byte strings for display purposes... sub show_bytes { my ($str) = @_; sprintf 'bytes[%s]', join q{ }, map { sprintf('%02x', ord(substr($str, $_, 1))) } 0 .. length($str)-1 } } my @lines = <DATA>; print "MULTIPLICATION:\n"; foreach (@lines) { my $b; chomp; printf( "'%s' => '%s' => '%s'\n", $_, show_bytes($b = alphanum_to_bytes_M($_)), bytes_to_alphanum_M($b), ); } print "BIT SHIFTING:\n"; foreach (@lines) { my $b; chomp; printf( "'%s' => '%s' => '%s'\n", $_, show_bytes($b = alphanum_to_bytes_B($_)), bytes_to_alphanum_B($b), ); } __DATA__ 0 1 A Z ABCDEF ABCDEG ZAAAAA ZZZZZZ

Replies are listed 'Best First'.
Re^2: Fast - Compact That String
by tobyink (Canon) on Feb 10, 2012 at 00:11 UTC

    I've had some time to benchmark it now. Run on an input of one million strings, the multiplying version takes 35 seconds and the bitshifting version takes 28 seconds. This confirms my hunch that the bitshifting version is faster. By about 20% it seems.

    Out of curiosity, I tried CountZero's XS version on the same processor, expecting it to be faster. But it took 124 seconds: significantly slower than either of the pure Perl versions. I imagine that the overhead of object construction/destruction, along with its use of regular expressions slows it down.

    (PS: I know the so-called bitshifting version doesn't use the bitshift operators. My initial implementation used "N" as the template for pack and unpack and used bitshifting to separate out the head and tail parts. I tweaked it to use "nn", which allowed me to drop the bitshifting, but I've kept referring to it as the bitshifting version anyway.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://952877]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-03-29 08:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found