This all started out as just a little proof-of-concept for my personal amusement. I set out to create a regexp that uses the (?{....}) construct to parse a string of bits of arbitrary length, and return their decimal value. This sort of thing already exists with unpack and vec, but curiosity prevailed, and I just wanted to see what a regexp approach would look like in the end. Ultimately, the regexp engine isn't accomplishing much aside from iterating over each character in the bit-string. Plain old Perl code within the (?{....}) brackets is doing the work that might just as easily be done outside of the regexp. But that notwithstanding, it was entertaining to tinker with.

The gadget works great... under some conditions. It is the conditions that fail which have me perplexed to the point of needing to post this SoPW. First I'll present a working example:

use strict; use warnings; print bin_to_dec('1101100'); sub bin_to_dec { my $bits = shift; my( $power, $magnitude, $num ); die "$bits is not a pure bit string.\n" if $bits =~ m/[^10]/; if ( $bits =~ m/ (?{ $power = length($_) - 1; }) (?: ([10]) (?{ $magnitude = 2 ** $power; $^N eq '1' and $num += $magnitude; $power--; }) )+ /x ) { return $num; } else { die "Unable to resolve bits: $bits.\n"; } }

The output is '108', as you would expect, assuming high-order bit is at the left. And this subroutine (which is laboriously explicit in the spirit of providing a clear to understand snippet) works great for any bit string from one digit to over a thousand binary digits.

But look what happens when I read from the filehandle <DATA> to test a series of binary strings.

use strict; use warnings; while ( <DATA> ) { chomp; print bin_to_dec($_), "\n"; } sub bin_to_dec { my $bits = shift; my( $power, $magnitude, $num ); die "$bits is not a pure bit string.\n" if $bits =~ m/[^10]/; if ( $bits =~ m/ (?{ $power = length($_) - 1; $num = 0; }) (?: ([10]) (?{ $magnitude = 2 ** $power; $^N eq '1' and $num += $magnitude; $power--; }) )+ /x ) { return $num; } else { die "Unable to resolve bits: $bits.\n"; } } __DATA__ 00000000 00000011 00000111 11100000 __OUTPUT__ 0 Use of uninitialized value in print at test.pl line 9, <DATA> line 2. Use of uninitialized value in print at test.pl line 9, <DATA> line 3. Use of uninitialized value in print at test.pl line 9, <DATA> line 4.

I have tried to isolate the quirk by putting a print statement to print $num within the (?{....}) construct, and as I hoped, $num does get the appropriate value. But when I put a print "$num\n"; just before the subroutine's return, $num has no value. ...in the second snippet. In the first snippet there is no problem.

I have also tried using $num (and the other variables used within the regexp) as package globals, with our, as well as with use vars, thinking that maybe lexical scoping was causing my pain. In so doing, I declared those variables at the top of the script to give them the broadest possible exposure. Again, no change; the second snippet fails, and the first snippet works great.

So I turn to you guys to see if anyone else can confirm or deny this funky behavior. I'm using ActiveState Perl 5.8.4 on WinXP.


Dave


In reply to Inexplicable uninitialized value when using (?{...}) regexp construct. by davido

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.