moensch has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks I have managed to create a function which returns a different result the second time you call it - with the same parameter. To cut things short, here it is:
sub getChecksum($) { my $in = shift; my $check = 0; $in =~ m/(?:(.)(?{$check += ord($1);}))*^/; return $check; }
The function is supposed to return the sum of the decimal ASCII value of each character in the given string $in. Call it the first time (let's say, with the string foo), it returns 324, call it again, it returns 0. I then thought, that maybe the code in ?{} is not executed at all, but try replacing the regex with this:
$in =~ m/(?:(.)(?{print "ASCII value of '$1' is: ".ord($1)."\n"; $chec +k += ord($1);}))*^/;
You'll see that even on the second call, it evaluates the code in ?{}, but does not increment $check.
getChecksum called with: 'foo' ASCII value of 'f' is: 102 ASCII value of 'o' is: 111 ASCII value of 'o' is: 111 will return '324' getChecksum called with: 'foo' ASCII value of 'f' is: 102 ASCII value of 'o' is: 111 ASCII value of 'o' is: 111 will return '0'
Surely I could replace the regex with this:
my @chars = split( //, $in ); foreach my $char (@chars) { $check += ord($char); }
But who would want this when it should(?) work in the regex? ;-) I am running this on debian etch (perl v5.8.8). Anybody who has an explanation for this is allowed to run to the shop and buy himself a cookie :D

Replies are listed 'Best First'.
Re: Weird behaviour with match-time code evaluation and backtracking
by Corion (Patriarch) on Mar 05, 2008 at 08:16 UTC

    Under Perl 5.10 you will get the warning that $check cannot be closed over:

    Q:\> perl -wle "sub c{my$c=0;$_[0]=~m/(?:(.)(?{$c += ord($1);}))*^/;$c +}c('foo')" Variable "$c" will not stay shared at (re_eval 1) line 1.

    I'm not sure what the generic workaround is though, but I avoid code in regular expressions - I guess in a general case, I would use a split+reduce approach instead. In your specific case, the checksum can also be calculated using unpack:

    sub getChecksum { unpack '%A*', $_[0] };
Re: Weird behaviour with match-time code evaluation and backtracking
by BrowserUk (Patriarch) on Mar 05, 2008 at 09:16 UTC

    Use our instead of my and your code will work as you expect.

    C:\test\DBMNested>p1 [0] Perl> sub getChecksum($) { my $in = shift; our $check = 0; $in =~ m/(?:(.)(?{$check += ord($1);}))*^/; return $check; };; print getChecksum( 'foo' );; 324 print getChecksum( 'foo' );; 324 print getChecksum( 'foobar' );; 633

    Though its a very slow way to achieve your goal. Corion's unpack tip is the way to go.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That should be local our $check;. Let's not clobber our caller's variables.

      sub getChecksum($) { my $in = shift; local our $check = 0; $in =~ m/(?:(.)(?{$check += ord($1);}))*^/; return $check;
Re: Weird behaviour with match-time code evaluation and backtracking
by ikegami (Patriarch) on Mar 05, 2008 at 09:39 UTC

    Three tidbits to add to what's already been said:

    The (?{...}) and (??{...}) regexp assertions are closures. They capture the lexical scope that existed when the regexp was compiled. The use of a package variable instead of a lexical variable — BrowserUK's solution — thus avoids the problem.

    Your code fails if there are newlines in your input. "." won't match a newline without the s modifier.

    Aside from unpack, Digest::CRC provides a number of known checksum algorithms in a tested package. Implemented in C (with a Perl fallback) and dedicated to the task, Digest::CRC and unpack should be much faster than your regexp solution.

Re: Weird behaviour with match-time code evaluation and backtracking
by stiller (Friar) on Mar 05, 2008 at 09:04 UTC
    In stead of:
    my @chars = split( //, $in ); foreach my $char (@chars) { $check += ord($char); }
    You could use:
    $check += ord $_ for split //, $in;
    But to your question, I don't know. I think I read an explanation once by Dominus, on the mailing list to his superb book Higher Order Perl (which I highly recomend!)

    Edit: In light of BrowserUk's response below, it's unlikely that this is related to the explanation I mentioned above. Sorry for confusing...