I'm employing Perl for a lot of data crunching, converting between various data formats, etc. There's a nice snippet in the Cookbook named bin2dec - to convert binary numbers (given as strings) to decimals. Like "10011" to "19".

This bin2dec has a limitation of 32 bits, however, because it internally converts to an integer. What I need is bin2dec for binary strings of any length.

Using a full big number library like Math::Bigint for it is a bit of an overkill since I only need binary to decimal conversion. So, I've come up with the following trick:

print bin2dec("1110010101011101111011110110101010111"); # Calculate a (arbitrarily large) decimal number from its # binary representation # sub bin2dec { my ($str) = @_; my $ret = ""; $ret = mul2($ret, $_) foreach split(//, $str); return $ret; } # Given a (arbitrarily large) decimal number N as a string, # returns 2N or 2N+1 # sub mul2 { my ($str, $add_one_f) = @_; defined($add_one_f) or $add_one_f = 0; my $ret = ""; foreach (my $i = length($str) - 1; $i >= 0; --$i) { my $c = substr($str, $i, 1) * 2; $c += 1 if ($add_one_f); $ret = ($c % 10) . $ret; $add_one_f = ($c >= 10); } return $add_one_f ? '1' . $ret : $ret; }
The idea is that there are no long overflows in binary to decimal conversion. I'm shifting the binary number from left to right (msbits first), multiplying by 2 for each shift and adding 1 if the bit is 1. So, all I need is to implement multiplication by 2 (with a possible addition of 1) on string decimals, and that's quite simple (as demonstrated in mul2) because for these operations the carry can be maximum 1.

I'll be glad if someone finds this code useful.

Naturally, ideas for improvements/optimization will be welcomed. My initial implementation was in C++ so this code is a direct translation, and as such can feel "C++ish". More Perlish techniques will be welcomed.

Replies are listed 'Best First'.
Re: bin2dec for big numbers
by Aristotle (Chancellor) on Jan 18, 2005 at 12:11 UTC

    Starting out with a value and modifying it piecemeal as you go through a list of something, which you do with $ret and the digits in both functions, is an ideal candidate for List::Util's reduce function.

    use List::Util qw( reduce ); sub bin2dec { my ( $str ) = @_; reduce { mul2( $a, $b ) } "", split //, $str; }

    Here, in the first iteration, $a is the empty string from the list passed to reduce, and $b is the first digit. The return value of the block is passed back to the block as $a in the next iteration, while $b goes through the all the remaining values from the list one by one.

    $add_one_f I'd rename as $have_overflow, which documents intent rather than implementation. And if you only use it as a boolean, you don't need to initialize it either.

    The for(;;) loop isn't necessary, you don't use the index anywhere, you just extract each corresponding character once. So you could rewrite that much more clearly as for( reverse split //, $str ). Then it could be turned into a reduce in the same way as above, but since you're just accumulating characters onto the front of a string, you are better served with a join reverse map.

    sub mul2 { my ( $str, $have_overflow ) = @_; my $ret = join '', reverse map { my $c = $_ * 2 + ( $have_overflow ? 1 : 0 ); $have_overflow = ( $c >= 10 ); ( $c % 10 ); } reverse split //, $str; ( $have_overflow ? 1 : '' ) . $ret; }

    You could even get rid of the out-of-band overflow check at the bottom if you rearrange things a little:

    sub mul2 { my ( $str, $have_overflow ) = @_; join '', reverse map { my $c = $have_overflow ? 1 : 0; $c += $_ * 2 if length; $have_overflow = ( $c >= 10 ); # return a zero char only if not at top of string ( $c % 10 ) or ( length() ? 0 : '' ); } reverse '', split //, $str; }

    But in this case I find it obfuscates more than it reduces redundancy.

    Incidentally, the ( $have_overflow ? 1 : '' ) above is a no-op if $have_overflow is a boolean like in this case, since 1 and the empty string are what the boolean ops return in Perl for true and false respectively. So that could be written $have_overflow . $ret, though that lacks the the benefit of documented intent.

    Makeshifts last the longest.

      Thanks for the insightful reply.

      reduce definitely makes the code much clearer. I wonder how it is in terms of performance versus the bare loop.

      I wholeheartedly agree with the variable name change you are proposing. has_overflown makes the code clearer.

      Regarding changing the for loop with index to one w/o an index but on reverse, I just have a performance concern. reverse probably takes linear time in string length, and with an explicit for loop that goes from the end of the string to its beginning the cost is saved.

      Also, in your proposed implementation of mul2 with join reverse map I'm not sure if it's indeed clearer than the for loop. Again, the issue of performance rises, since you have two reverses extra.

        ObCounter: if you're so concerned about speed, why are you doing numerics in Perl? :-) In case of the reverse join, I don't see why your code is better anyway: you are constantly replacing the string by a character+string.

        In any case, it could probably be sped up a little by using string reverse vs list reverse:

        sub mul2 { my ( $str, $have_overflow ) = @_; my $ret = reverse join '', map { my $c = $_ * 2 + ( $have_overflow ? 1 : 0 ); $have_overflow = ( $c >= 10 ); ( $c % 10 ); } split //, scalar reverse $str; ( $have_overflow ? 1 : '' ) . $ret; }

        Makeshifts last the longest.

Re: bin2dec for big numbers (use Math::BaseCalc)
by grinder (Bishop) on Jan 18, 2005 at 11:37 UTC
    More Perlish techniques will be welcomed.

    I don't mean to rain on your parade, but I think that one of the most Perlish techniques is to see whether anyone else has already written it... enter Math::BaseCalc.

    #! /usr/bin/perl -w use strict; use Math::BaseCalc; my $bin = Math::BaseCalc->new( digits => [0,1] ); my $num; for $num( @ARGV ) { print "$num base 2 = ", $bin->from_base($num), " base 10\n"; }

    When run, the above produces:

    % ./b2d 1110010101011101111011110110101010111 1110010101011101111011110110101010111 base 2 = 123140435287 base 10

    And you can be sure that the module has received extensive testing...

    Update: you're right, if the number to be converted is big enough, it will use scientific notation. You learn something every day.

    - another intruder with the mooring in the heart of the Perl

      I looked into Math::BaseCalc, and I might be missing something, but it doesn't look to answer my requirements.

      It seems that Math::BaseCalc is also limited in the size of numbers it can handle. Try giving a very long string to it, say 200 binary digits. It returns results in scientific notation (...e+...), and those of course are far from being precise. It's probably limited to 32 or 64 bits (depends on what Perl's int is)

      My bin2dec, OTOH, always returns an exact decimal number.

      So, am I missing something ?

      Thanks for this, I will keep it in mind for the future.

      The original implementation was in C++ so BaseCalc was out of the question - I just translated it to Perl to demonstrate the tecnique to fellow monks and ask about clever usage of Perl's operators to make the code less C++ like.

Re: bin2dec for big numbers
by hv (Prior) on Jan 18, 2005 at 14:24 UTC

    You may be able to get rather better speed with some simplistic string manipulation. Consider:

    sub mul2 { my($str, $add) = @_; (my $carry = $str) =~ tr/0-9/0000011111/; (my $str2 = "0$str") =~ tr/0-9/0246802468/; my $result = $str2 | "$carry$add"; $result =~ s/^0//; $result; }

    Hugo

Re: bin2dec for big numbers
by merlyn (Sage) on Jan 18, 2005 at 10:33 UTC
    That's not "bin2dec". That's "bin2number". A common mistake, but a mistake nonetheless.

    It's Perl that's doing "number2dec" when the number is being printed for you to read it. But before that, it's not "decimal". It's just a quantity without a number base.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: For those of you that didn't follow my 4am garbled reference... I was talking about the "cookbook" bin2dec, which is misnamed. Yes, the bignum code included in this post is in fact bin-string-to-dec-string. Cool.

      Well no. He is converting a base-2 string to a base-10 string, explicitly working character by character on strings in both cases. Perl is internally doing the base-10 to base-2^32 and back thing as he works through the base-10 digits, but the algorithm itself is indeed bin2dec.

      Makeshifts last the longest.

      I don't understand. Using your language, it must then be number2number, since a perl string of 0s and 1s is no more 'bin' than a string of 0s-9s is a 'dec'.

      IMHO it's a matter of definitions, and on what abstraction level you think about it. I think about a string of 0s and 1s as a binary number in my internal represenation. The same way, I think about a string of 0s - 9s as a decimal string in my internal representation.

      What Perl does for me is just print ASCII characters on the screen.

Re: bin2dec for big numbers
by TilRMan (Friar) on Feb 26, 2005 at 18:40 UTC

    Nothing overkill about using a module.

    use Math::BigInt; sub bin2dec { my ($bin) = @_; return Math::BigInt->new("0b$bin"); }