gooch has asked for the wisdom of the Perl Monks concerning the following question:

Oh highly esteemed monks, hear my plea for knowledge.
I am attempting to duplicate in Perl the checksum functionality in the following C code.
#include <stdio.h> /********************************************************************* +***/ /* checksum -- verify checksum */ /********************************************************************* +***/ int checksum (buff, bufflen) char *buff ; int bufflen ; { int ctr ; int retval ; char tmpstr [5] ; long strtol () ; unsigned int chk_sum = (unsigned) 0 ; int twos_comp ; ctr = 0 ; while (ctr < (bufflen - 5)) { chk_sum = chk_sum + (buff [ctr] & 0x7f) ; ctr ++ ; } strncpy (tmpstr, &buff [bufflen - 5], 4) ; tmpstr [4] = '\0' ; twos_comp = (int) strtol (tmpstr, (char **) NULL, 16) ; retval = (((chk_sum + twos_comp) & 0xFFFF) == 0) ? 0 : 1 ; return (retval) ; }
Problem? I know ZIP about C, and the definitions I have found for the various functions (all of which appear to be "standard" C ) don't give me enough understanding of what the C function actually does to allow me to replicate the functionality in Perl.
I have gotten as far as the second to last couple of lines, and there I am completely stumped.

Sample input consists of <SOH>9999FF1B<ETX> where <SOH>=0x01 and <ETX>=0x3 respectively.
The inbound is an ASCII string.
Output is either a 0 or 1 (failure / success). I seek either of two things.
1. A plain english description of what the second to the last line of code is actually doing, step by step.
2. An example in Perl of what it is doing. I am quite willing to beat my head against the example until I garner understanding of what is actually happening.

A humble(d) acolyte,
Mike Gucciard

Replies are listed 'Best First'.
Re: Conversion of C code to Perl-ese.
by BrowserUk (Patriarch) on Aug 14, 2003 at 01:31 UTC

    Try this version.

    #! perl -slw use strict; sub checksum { my( $string ) = @_; my $chk_sum = 0; $chk_sum += $_ for unpack 'C*', substr $string, 0, -5; my $twos_comp = hex( substr $string, -5, 4 ); return ( ( $chk_sum + $twos_comp ) & 0xFFFF ) ? 1 : 0; } print "$_ : ", checksum $_ for "\x019999FF1B\x03", "\x019998FF1B\x03"; __END__ P:\test>283701 ?9999FF1B? : 0 ?9998FF1B? : 1

    As for what is going on. The encoding routine adds up the (7-bit) ascii values of the message, takes the 16-bit twos complement of the total and adds the result to the end of the message.

    eg.

    9999 = 57 + 57 + 57 + 57 = 228. ~ 228 = -229 decimal = FF1B hex (16-bit) Add the header and trailer characters = <SOH>9999FF1B<ETX>

    To check the transmission was uncorrupted, the checksum routine totals up the (7-bit) ascii values of the message - the last 5 characters.

    1 + 57 + 57 + 57 + 57 = 229

    Converts the last 5-1 characters back from hex

    FF1B = 65307

    Then adds the two together (discarding any bits greater than 16 which could happen with longer messages)

    (65307 + 229) = 65536 65536 & 65535 = 0x10000 & 0xFFFF = 0

    If the result is 0, the checksum matched and the function returns 0 to indicate success--or perhaps a lack of failure:)

    A perl implementation of the encoding routine might look like this

    sub build_string { my( $string ) = @_; my $chksum = 0; $chksum += $_ for map{ $_ & 0x7F } unpack 'C*', $string; $chksum = (~$chksum & 0xFFFF); return chr(1) . $string . sprintf( '%4x', $chksum ) . chr(3); }

    HTH.

    If the result isn't 0, then the checksum didn't match and the function return 1--to indicate corruption occured?

    Note: The return value is backwards from your expectation, and mine, but that is what the code is doing (unless I am completely misinterpreting it, which is always possible!).


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      Thank you all so very much for your efforts.
      I especially want to thank CombatSquirrel, and BrowserUK.
      Between the two of you I now not only understand more clearly what the C code does, but how it is doing it, and +++ to BrowserUK, I hadn't even gotten to the point of thinking that having a routine to encode with might be handy (For testing, say). That was a completely unexpected and much appreciated bonus.

      I am formally out of the doldrums, and sailing full speed ahead on my project.

      It is my hope that someday I will accumulate enough knowledge to be able to contribute in similar fashion, here in the Monestary, to the enlightenment of my fellow Monks.

      A much less confused Acolyte,
      Mike Gucciard
Re: Conversion of C code to Perl-ese.
by CombatSquirrel (Hermit) on Aug 13, 2003 at 22:54 UTC
    This should be it (no guarantees, you might want to verify):
    sub checksum ($) { my $string = shift; my @chars = split //, $string; my $sum; $sum += $_ & 0x7F for (@chars[0..@chars-5]); # add ASCII values up $sum += hex(join('', @chars[@chars-4..@chars-1])); # add the last four characters interpreted as hex number return (($sum & 0xFFFF == 0) ? 0 : 1); }

    Cheers, CombatSquirrel
      Hmmm, looks like I was on the right track all along, as I can follow your Perl pretty well...

      Unfortunately that calls into question if the C code was ever doing what it purported to do in the first place, as the routine should fail if, for example the inbound string is <SOH>9998FF1B<ETX> - and although as far as I can tell you have provided an exact duplicate of what the C code does, I ran the above string, through the provided subroutine, and received "0"...
      So...
      According to the description, the C code purports to do the following:
      "The Checksum is a series of four ASCII-hexidecimal characters which provide a check on the integrity of all characters preceeding it. The four characters represent a 16-bit binary count which is the 2's complimented sum of the 8-bit binary representations of the message characters. The data integrity check is done by converting the four checksum characters into a 16-bit binary, and adding the 8-bit binary representation of the message characters to it. The binary result should be zero."
      Implied, is that any value other than zero indicates a checksum failure.

      Can anyone tell me if the C code is actually doing that, or for that matter how CombatSquirrel's kindly provided code sample could be made to perform this (to me rather arcane) act?

      Still out of my depth here, but at least someone threw me a floatation device.

      Thanks CombatSquirrel.
      Mike Gucciard
        Yes, I believe that the C code is doing what your documentation describes. However, your assumption about return values may be incorrect. Looking at the C code, I believe the function returns 0 if the checksum succeeds (is correct) and 1 if the checksum fails (is incorrect).

        See if that matches your sample results.

        Not documented in the C code is what happens if bufflen is less than 5 or greater than the actual size of the buff. You may have trouble getting Perl to replicate the behavior in these cases (accessing random memory on the stack).

        -- Eric Hammond

Re: Conversion of C code to Perl-ese.
by hsmyers (Canon) on Aug 14, 2003 at 04:51 UTC
    My copy of C Pocket Reference: C Syntax and Fundamentals, by Peter Prinz and Ulla Kirch-Prinz (translated by Tony Crawford). O'Reilly Books, 0-596-00436-2, shows the following information:

    int strtol (const char *s, char **pptr, int base );

    Defined as:

    Converts a string to a number with type long. The third parameter is the base of the numeral string, and may be an integer between 2 and 36, or 0. If base is 0, the string s is interpreted as a numeral in base 8, 16, or 10 depending whether it begins with 0, 0x or one of the digits 1 to 9.

    The analogous functions of converting a string to unsigned long, long long(*) or unsigned long long(*) are strtoul()(*), strtoll()(*) and strtoull()(*).

    char strncpy (char *s1, const char *s2 size_t n );

    Defined as:

    Copies the first n characters of s2 to the char array s1. The string terminator character '\0' is not appended.

    Update: fix formatting.

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
Re: Conversion of C code to Perl-ese.
by gooch (Monk) on Aug 13, 2003 at 22:34 UTC
    So, even after 5 different previews, I still botched my first question post...
    Item 1. of what I am seeking should read:
    1. A plain english description of what the section of code from "twos_comp = ..." to "return (retval)" is doing, step by step.

    Try as I might, I cannot fight my way to understanding of this last section.
    Mike Gucciard