Weird behaviour with match-time code evaluation and backtracking

moensch has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks I have managed to create a function which returns a different result the second time you call it - with the same parameter. To cut things short, here it is:

sub getChecksum($) {
        my $in = shift;
        my $check = 0;
        $in =~ m/(?:(.)(?{$check += ord($1);}))*^/;
        return $check;
}
[download]

The function is supposed to return the sum of the decimal ASCII value of each character in the given string $in. Call it the first time (let's say, with the string foo), it returns 324, call it again, it returns 0. I then thought, that maybe the code in ?{} is not executed at all, but try replacing the regex with this:

$in =~ m/(?:(.)(?{print "ASCII value of '$1' is: ".ord($1)."\n"; $chec
+k += ord($1);}))*^/;
[download]

You'll see that even on the second call, it evaluates the code in ?{}, but does not increment $check.

getChecksum called with: 'foo'
ASCII value of 'f' is: 102
ASCII value of 'o' is: 111
ASCII value of 'o' is: 111
will return '324'

getChecksum called with: 'foo'
ASCII value of 'f' is: 102
ASCII value of 'o' is: 111
ASCII value of 'o' is: 111
will return '0'
[download]

Surely I could replace the regex with this:

        my @chars = split( //, $in );
        foreach my $char (@chars) {
                $check += ord($char);
        }
[download]

But who would want this when it should(?) work in the regex? ;-) I am running this on debian etch (perl v5.8.8). Anybody who has an explanation for this is allowed to run to the shop and buy himself a cookie :D

Comment on Weird behaviour with match-time code evaluation and backtracking Select or Download Code

Replies are listed 'Best First'.
Re: Weird behaviour with match-time code evaluation and backtracking by Corion (Patriarch) on Mar 05, 2008 at 08:16 UTC
Under Perl 5.10 you will get the warning that `$check` cannot be closed over: `Q:\> perl -wle "sub c{my$c=0;$_[0]=~m/(?:(.)(?{$c += ord($1);}))^/;$c +}c('foo')" Variable "$c" will not stay shared at (re_eval 1) line 1.` [download] I'm not sure what the generic workaround is though, but I avoid code in regular expressions - I guess in a general case, I would use a `split`+`reduce` approach instead. In your specific case, the checksum can also be calculated using unpack: `sub getChecksum { unpack '%A', $_[0] };` [download]	[reply] [d/l] [select]
Re: Weird behaviour with match-time code evaluation and backtracking by BrowserUk (Patriarch) on Mar 05, 2008 at 09:16 UTC
Use our instead of my and your code will work as you expect. `C:\test\DBMNested>p1 [0] Perl> sub getChecksum($) { my $in = shift; our $check = 0; $in =~ m/(?:(.)(?{$check += ord($1);}))*^/; return $check; };; print getChecksum( 'foo' );; 324 print getChecksum( 'foo' );; 324 print getChecksum( 'foobar' );; 633` [download] Though its a very slow way to achieve your goal. Corion's unpack tip is the way to go. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Weird behaviour with match-time code evaluation and backtracking by ikegami (Patriarch) on Mar 05, 2008 at 09:42 UTC
That should be `local our $check;`. Let's not clobber our caller's variables. `sub getChecksum($) { my $in = shift; local our $check = 0; $in =~ m/(?:(.)(?{$check += ord($1);}))*^/; return $check;` [download]	[reply] [d/l] [select]
Re: Weird behaviour with match-time code evaluation and backtracking by ikegami (Patriarch) on Mar 05, 2008 at 09:39 UTC
Three tidbits to add to what's already been said: The `(?{...})` and `(??{...})` regexp assertions are closures. They capture the lexical scope that existed when the regexp was compiled. The use of a package variable instead of a lexical variable — BrowserUK's solution — thus avoids the problem. Your code fails if there are newlines in your input. "`.`" won't match a newline without the `s` modifier. Aside from `unpack`, Digest::CRC provides a number of known checksum algorithms in a tested package. Implemented in C (with a Perl fallback) and dedicated to the task, Digest::CRC and `unpack` should be much faster than your regexp solution.	[reply] [d/l] [select]
Re: Weird behaviour with match-time code evaluation and backtracking by stiller (Friar) on Mar 05, 2008 at 09:04 UTC
In stead of: `my @chars = split( //, $in ); foreach my $char (@chars) { $check += ord($char); }` [download] You could use: `$check += ord $_ for split //, $in;` [download] But to your question, I don't know. I think I read an explanation once by Dominus, on the mailing list to his superb book Higher Order Perl (which I highly recomend!) Edit: In light of BrowserUk's response below, it's unlikely that this is related to the explanation I mentioned above. Sorry for confusing...	[reply] [d/l] [select]