silentq has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I'm dealing with a problem where hexadecimal character references such as '\x{FEFF}' and 'chr(0xFEFF)' are ignored when they appear inside a regex expression.

However, I can copy hexidecimal or decimal references into a Perl variable, insert that variable into a regex, and then it works. Here's an example:

$string = 'ABC'; $chr65 = chr(65); $chr97 = chr(97); $string =~ s/$chr65/$chr97/; print "$string\n"; #Input = 'ABC', Output = 'aBC'

Others seem to be able to quote these references directly inside a regex expression and have them be understood. I'm wondering if I might need to be referencing a specific package name in my script. Can anyone help?

Thanks,

Replies are listed 'Best First'.
Re: Hexadecimal character references not understood inside a regex
by tobyink (Canon) on May 28, 2013 at 15:19 UTC

    The following ought to work in any vaguely recent version of Perl...

    #!/usr/bin/env perl use strict; use warnings; my $string = "ABC"; $string =~ s/\x{41}/\x{61}/g; print $string, "\n"; # aBC
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Hexadecimal character references not understood inside a regex
by choroba (Cardinal) on May 28, 2013 at 15:18 UTC
    The $string is probably a Unicode string. It makes a difference:
    perl -CO -wE ' for my $s ( "á", do { use utf8; "á" } ) { say $s, $s =~ $_ ? 1 : 0 for qr/\xc3\xa1/, qr/\xe1/ }'

    Output in a utf-8 terminal:

    á1 á0 á0 á1
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Hexadecimal character references not understood inside a regex
by Jim (Curate) on May 28, 2013 at 20:45 UTC
    I'm dealing with a problem where hexadecimal character references such as '\x{FEFF}' and 'chr(0xFEFF)' are ignored when they appear inside a regex expression [sic].

    If you're trying to match the Unicode byte order mark, then Perl must understand the text is Unicode and not in some other national or vendor coded character set. Does it? Show us more of your real code. I suspect your problem has less to do with hexadecimal escape sequences in regular expression patterns and more to do with decoding and encoding of input and output. Just a hunch…

    There's also this idiom…

    use charnames qw( :full ); m/\N{BYTE ORDER MARK}/;