Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Raw data communication bytes are stored in a string. I'm trying to "unstuff" DLE characters by replacing pairs of DLE with a single DLE. Looks like the match part is working, but the replacement string is not expanding properly ??? Other suggestions also welcomed. Thanks !
#!/usr/bin/perl -w use strict; use constant DLE=>0x10; my $dle = sprintf ("%02X", DLE); my $msg = pack ("CCCCC", 0x31, 0x32, 0x10, 0x10, 0x33); # match $dle's expanded, replacement is not expanded ? $msg =~ s/\x{$dle}\x{$dle}/\x{$dle}/g; printf ("%02x " x length($msg) . "\n", unpack ("C*", $msg));

[scott@localhost test]$ ./test.pl Illegal hexadecimal digit '$' ignored at ./test.pl line 7. 31 32 00 33 [scott@localhost test]$

Replies are listed 'Best First'.
Re: Why is variable interpolation suppressed in \x{$xxx} replacement ?
by Abigail-II (Bishop) on Sep 29, 2003 at 14:33 UTC
    Welcome to Perl's world of little languages!

    There are many little languages inside Perl, and some little languages even have little languages inside them! Regular expressions for instance are a little language inside Perl, different semantics, different syntax, different grammar. But inside this little language, there are several other little languages. For instance, the content of [ ], doesn't follow the syntax and semantic rules of the rest of the regular expressions. Another little language inside regular expressions is the \x{ } construct. Different syntax/semantics here as well. And one of the differences is that they don't interpolate variables.

    Having said that, it's still possible to do what you want, once you've evalled strings repeatedly:

    #!/usr/bin/perl -w use strict; use constant DLE=>0x10; my $dle = sprintf ("%02X", DLE); my $msg = pack ("CCCCC", 0x31, 0x32, 0x10, 0x10, 0x33); # match $dle's expanded, replacement is not expanded ? $msg =~ s/(??{ eval qq!"\\x{$dle}\\x{$dle}"! })/qq!"\\x{$dle}"!/gee; printf ("%02x " x length($msg) . "\n", unpack ("C*", $msg)); __END__ 31 32 10 33

    Abigail

Re: Why is variable interpolation suppressed in \x{$xxx} replacement ?
by broquaint (Abbot) on Sep 29, 2003 at 14:48 UTC
    I believe this is because the \x{...} is evaluated at compile-time, and the characters between braces aren't interpolated e.g
    use strict; use warnings; use constant DLE => 0x10; my $dle = sprintf ("%02X", DLE); print "before\n"; print "string is: [\x{$dle}]\n"; __output__ Illegal hexadecimal digit '$' ignored at pmsopw_294950.pl line 9. before string is: []
    A simpler solution would be just to continue using pack e.g
    use strict; use warnings; my $msg = pack "CCCCC", 0x31, 0x32, 0x10, 0x10, 0x33; my $c = pack "C", 0x10; $msg =~ s/$c$c/$c/g; printf ("%02x " x length($msg) . "\n", unpack ("C*", $msg)); __output__ 31 32 10 33

    HTH

    _________
    broquaint

Re: Why is variable interpolation suppressed in \x{$xxx} replacement ?
by sgifford (Prior) on Sep 29, 2003 at 16:16 UTC
    One way to do it is to just use chr to get the character you want, then use that:
    #!/usr/bin/perl -w use strict; use constant DLE=>0x10; use vars qw($DLE_C); $DLE_C = chr(DLE); my $msg = pack ("CCCCC", 0x31, 0x32, 0x10, 0x10, 0x33); # match $dle's expanded, replacement is not expanded ? $msg =~ s/$DLE_C$DLE_C/$DLE_C/g; printf ("%02x " x length($msg) . "\n", unpack ("C*", $msg));

    Another way is to use capturing inside the RE, then just replace with the captured part:

    $msg =~ s/(\x{$dle})\x{$dle}/$1/g;

    As for why it doesn't work the way you expect, Abigail-II seems to have already given a better answer than I could.

      I like the chr() approach; however, I was concerned about getting tangled up in multibyte unicode problems. It *looks* like if I use chr(???) with a ??? <= 255 I will always get the single byte I am looking for (i.e. not translated to/from some type of unicode symbol set). Correct ?
      With respect to your second suggestion:
      1. Yes, I also had this thought. I was originally trying to run with .../go; (for efficiency) and I was concerned that the captured string would not be inserted without recompilation. I was probably mistaken ... :)
      2. The fact that this works implies the the initial \x{$dle} IS interpolated and the replacement one IS NOT. Abigail-II's explanation did not cover why one is interpolated and the other is not. Seems unduly inconsistant - even for perl :)
      Thanks to all ! (Is it customary to reply with "thanks" (only), or is that considered unnecessary babble ?) Thanks, Scott.
        It *looks* like if I use chr(???) with a ??? <= 255 I will always get the single byte I am looking for (i.e. not translated to/from some type of unicode symbol set). Correct ?
        Well... in a way... yes. But you're overlooking one thing: if Perl concatenates a UTF8 string with a Latin-1 string (at least, that's the only way to think about it that makes sense), Perl will convert the Latin-1 string to UTF-8. Let me show you with an example:
        ($\, $,) = ("\n", " "); # set up output mode $string = "A" . chr(180) . "B"; # Latin-1 print unpack "C*", $string; $string .= chr(367); # UTF-8 print unpack "C*", $string;
        Output:
        65 180 66
        65 194 180 66 197 175
        
        As you can see, the original chr(180), between chr(65) ("A") and chr(66) ("B") is converted to UTF-8, rsulting in two bytes.

        So, if you want UTF-8, all you have to do is insert the characters into a UTF-8 string, or concatenate it with a UTF-8 string. That may even be a zero-length string, asq returned by pack "U0":

        ($\, $,) = ("\n", " "); # set up output mode $string = "A" . chr(180) . "B"; # Latin-1 print unpack "C*", $string; $string .= pack "U0"; # zero length, UTF-8 print unpack "C*", $string;
        Result:
        65 180 66
        65 194 180 66
        

        p.s. This was tested with perl 5.6.1. on Windows. Not that it matters much — it shouldn't, except that you need at least perl 5.6.