in reply to Faster Luhn Check Digit Calculation?

I agree Inline::C will be fastest, but lookup tables can help a pure-Perl implementation a lot.

LUHN.pm:

package LUHN; use warnings; use strict; use List::Util qw(sum); use Data::Dump qw(dd); sub cd_kschwab { # pm#1226545 use integer; my $total = 0; my $flip = 1; foreach my $c (reverse split //, shift) { $c *= 2 unless $flip = !$flip; while ($c) { $total += $c % 10; $c = $c / 10; } } return (10 - $total % 10) % 10; } my @x2_cast_out_9 = ( # 0 1 2 3 4 5 6 7 8 9 # input number n 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 # output: n * 2, cast out 9 ); my @check_digit = map { (10 - $_ % 10) % 10 } 0 .. 9 * 15; sub cd_AnomalousMonk { my $ccn = shift; my $total = 0; for (1, 3, 5, 7, 9, 11, 13) { $total += substr($ccn, $_, 1); } for (0, 2, 4, 6, 8, 10, 12, 14) { $total += $x2_cast_out_9[ substr($ccn, $_, 1) ]; } return $check_digit[ $total ]; } sub cd_BrowserUk { # pm#1226582 use integer; my $s = $_[ 0 ]; my $total = 0; for my $i ( 0 .. 14 ) { my $d = substr( $s, $i, 1 ); unless( $i & 1 ) { $d *= 2; $d -= 9 if $d > 9; } $total += $d; } $total *= 9; return chop $total; } 1;

test_LUHN_1.pl:

# test_LUHN_1.pl 01dec18waw use warnings; use strict; use Benchmark qw(cmpthese); use LUHN; for ('000' .. '999') { my $ccn = '401135000000' . $_; my $kschwab_cd = LUHN::cd_kschwab ($ccn); my $AnomalousMonk_cd = LUHN::cd_AnomalousMonk ($ccn); die "$ccn: kschwab $kschwab_cd != $AnomalousMonk_cd" if $kschwab_cd != $AnomalousMonk_cd; } print "tested cc number range kschwab vs. AnomalousMonk ok \n"; for ('000' .. '999') { my $ccn = '401135000000' . $_; my $kschwab_cd = LUHN::cd_kschwab ($ccn); my $BrowserUk_cd = LUHN::cd_BrowserUk ($ccn); die "$ccn: kschwab $kschwab_cd != $BrowserUk_cd" if $kschwab_cd != $BrowserUk_cd; } print "tested cc number range kschwab vs. BrowserUk ok \n"; cmpthese (-1, { cd_kschwab => sub { LUHN::cd_kschwab ('401135000000000') }, cd_AM => sub { LUHN::cd_AnomalousMonk ('401135000000000') }, cd_BrowserUk => sub { LUHN::cd_BrowserUk ('401135000000000') }, });

Output:

c:\@Work\Perl\monks\kschwab>perl test_LUHN_1.pl tested cc number range kschwab vs. AnomalousMonk ok tested cc number range kschwab vs. BrowserUk ok Rate cd_kschwab cd_BrowserUk cd_AM cd_kschwab 52764/s -- -40% -62% cd_BrowserUk 88033/s 67% -- -37% cd_AM 139999/s 165% 59% --

Update: My solution and some of the others are hard-wired for generating a checksum for a 15-digit string. Here's a generalized variation. Caution: this is not well tested. Also, I don't know that I like the  @check_digit look-up array all that much; it might chew up a lot of space in certain circumstances. BrowserUk's  $total *= 9;  return chop $total; eats less space and is only very slightly slower than the lookup.

sub cd_AM_HOP { # iterator solution my ($n_digits, # n of digits for which to generate luhn checksum ) = @_; my $order = $n_digits & 1; my @straight = grep +($_ & $order), 0 .. $n_digits-1; # dd '---', + \@straight; my @adjusted = grep !($_ & $order), 0 .. $n_digits-1; # dd '===', + \@adjusted; my @check_digit = map { (10 - $_ % 10) % 10 } 0 .. 9 * $n_digits; return sub { my $ccn = shift; # digits for which to generate checksum my $total = 0; for (@straight) { $total += substr($ccn, $_, 1); } for (@adjusted) { $total += $x2_cast_out_9[ substr($ccn, $_, 1) ]; } # return $check_digit[ $total ]; # return check digit $total *= 9; return chop $total; # return check digit }; # end sub iterator }


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Faster Luhn Check Digit Calculation?
by kschwab (Vicar) on Dec 01, 2018 at 16:20 UTC
    Converted a hardcoded version of yours to Inline::C, and it's pretty much the same performance as BrowserUK's Inline::C:
    $ perl ./script
    Benchmark: timing 500 iterations of Inline::C (AnomalousMonk), Inline::C (BrowserUK)...
    Inline::C (AnomalousMonk):  6 wallclock secs ( 6.21 usr +  0.00 sys =  6.21 CPU) @ 80.52/s (n=500)
    Inline::C (BrowserUK):  6 wallclock secs ( 6.14 usr +  0.00 sys =  6.14 CPU) @ 81.43/s (n=500)
    
    Appreciate it! Conversion below:
    int cd4( char *ccn ) { int total = 0; int i; int co9[]={0,2,4,6,8,1,3,5,7,9}; int straight[]={1,3,5,7,9,11,13}; int adjusted[]={0,2,4,6,8,10,12,14}; for (i=0;i<7;i++) { total += ccn[straight[i]]-48; } for (i=0;i<8;i++) { total += co9[ccn[adjusted[i]]-48]; } total *=9; return total %10; }

      Here's another twist that might squeeze out a few more computrons: use array literals (if that's the correct terminology; introduced with C99 IIRC; I assume Inline::C supports C99) to compile the arrays rather than building local arrays on each subroutine call (if you don't want to just declare the arrays static| static const, which also works | should work). I haven't looked at the assembler output, but the idea (the hope?) is that the compiler would make these statics. There are two versions of the function below; both work. One is more heavily macro-ized because I wanted to see how far I could push this approach; for reasons of maintainability, I'm not sure I'd want it in production code. Only minimal testing has been done; I leave that to you :).

      t_array_literal_1.c:


      Give a man a fish:  <%-{-{-{-<

Re^2: Faster Luhn Check Digit Calculation?
by LanX (Saint) on Dec 01, 2018 at 21:40 UTC
    Lookup tabled were my first idea too ... but why not a string of digits together at the same time?

    Am I missing something?

    6 digits are 1 million entries, probably using substr is even faster than array lookup, for sure it's demanding less memory.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      ... why not a string of digits together at the same time?

      If I correctly understand what you're saying, something like
          my $check_digits = join '', map { chr((10 - $_ % 10) % 10) } 0 .. 9 * $n_digits;
          ...
          return substr $check_digits $total, 1;
      (returning the check digit as a character) certainly would be a more memory-conservative approach than using an array of, e.g., 136 numbers. However, my suspicion is that when you get through with the substr call, you've burned about as much time as with the just-as-simple  $total *= 9;  return chop $total; calls.

      6 digits are 1 million entries ...

      I don't understand this point: where do the 1 million entries come from? Would you be building a lookup string from every possible 6-digit combination? The OPed question posited 15 digits; that's a really long string.


      Give a man a fish:  <%-{-{-{-<

        To be clearer, I suppose that

        L("XXYY") = ( L("XX") + L("YY") ) % 10

        for all even length substrings XX and YY

        => a divide and conquer could be applied

        => computation complexity replaced with memory complexity.

        > The OPed question posited 15 digits; that's a really long string.

        One would do 3 lookups, leading zeros of the left most chunk shouldn't influence the checksum.

        L("XX") = L("0XX") = L("00XX") ) = ...

        > where do the 1 million entries come from?

        6 digits sound like a good compromise, you'd need even amount and 100 Meg for 8 digits sound exaggerated. (A Perl array has an factor 10 overhead IIRC)

        Anyway a C algorithm which fits into the line cache of the processor might still be faster.

        Not sure if a 10k string for a 4 digit partition would always fit into the L1 cache.

        update

        the best approach might be a C program which dynamically benchmarks different chunk sizes to test the effect of line caches before building the lookup table.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice