in reply to Weird quoting with /x modifier

I did the following:
use strict; use warnings; use YAPE::Regex::Explain; my $regex1 = qr{\QHello# World\E}x; my $parser1 = YAPE::Regex::Explain->new($regex1)->explain; print "$parser1\n"; print "*" x 20; print "\n"; my $regex2 = qr{Hello\#\ World}x; my $parser2 = YAPE::Regex::Explain->new($regex2)->explain; print "$parser2\n";
This returned:
The regular expression: (?x-ims:Hello\#\ World\\E) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- Hello 'Hello' ---------------------------------------------------------------------- \# '#' ---------------------------------------------------------------------- \ ' ' ---------------------------------------------------------------------- World 'World' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- E 'E' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ******************** The regular expression: (?x-ims:Hello\#\ World) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- Hello 'Hello' ---------------------------------------------------------------------- \# '#' ---------------------------------------------------------------------- \ ' ' ---------------------------------------------------------------------- World 'World' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Note the addition of the "\E" at the end of the first explanation, but not the second.

Note also that the "\Q" is dropped by YAPE::Regex::Explain.

The behaviour is constant across 5.8.9, 5.10.1 and 5.12.1 with YAPE::Regex::Explain 3.011

HTH

UPDATE:
From perldoc perlre:

(?#text)
A comment. The text is ignored. If the /x modifier enables whitespace formatting, a simple # will suffice. Note that Perl closes the comment as soon as it sees a ), so there is no way to put a literal ) in the comment.

Replies are listed 'Best First'.
Re^2: Weird quoting with /x modifier
by proceng (Scribe) on May 29, 2010 at 22:40 UTC
    Interesting addition:
    With the following code:
    use strict; use warnings; my $teststring = qq/Hello# World/; my $regex1 = qr{\QHello# World\E}x; my $regex2 = qr{Hello\#\ World}x; my $regex3 = qr{$teststring}x; my $regex4 = qr{\Q$teststring\E}x; print "String is $teststring\n"; print "Regex1 is $regex1\n"; print "Regex2 is $regex2\n"; print "Regex3 is $regex3\n"; print "Regex4 is $regex4\n";
    I get the following output (perl 5.12.1):
    String is Hello# World Regex1 is (?x-ism:Hello\#\ World\\E) Regex2 is (?x-ism:Hello\#\ World) Regex3 is (?x-ism:Hello# World ) Regex4 is (?x-ism:Hello\#\ World)
    For regex3, the newline is inserted by the qr directive, as verified by a hex dump of the file.

    UPDATE:
    Output from perl 5.8.9:

    $ perl8 regex.pl String is Hello# World Regex1 is (?x-ism:Hello\#\ World\\E ) Regex2 is (?x-ism:Hello\#\ World ) Regex3 is (?x-ism:Hello# World ) Regex4 is (?x-ism:Hello\#\ World )
    Output from perl 5.10.1:
    $ perl10 regex.pl String is Hello# World Regex1 is (?x-ism:Hello\#\ World\\E) Regex2 is (?x-ism:Hello\#\ World) Regex3 is (?x-ism:Hello# World ) Regex4 is (?x-ism:Hello\#\ World)
    So, it looks like there may be an edge case that was partially fixed.

    UPDATE 2: clarified that initial tests were on 5.12.1
    This is perl 5, version 12, subversion 1 (v5.12.1) built for x86_64-linux-thread-multi-ld
    This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi-ld
    This is perl, v5.8.9 built for x86_64-linux-thread-multi-ld

Re^2: Weird quoting with /x modifier
by ykar (Acolyte) on May 29, 2010 at 22:25 UTC

    >> Note the addition of the "\E" at the end of the first explanation, but not the second.

    This is exactly what bothering me. As far as I understand \Q and \E is just syntactic sugar, to avoid typing dozens of escapes. So neither \Q nor \E should appear in compiled regular expression.

    Regarding your excerpt from perlre: it is not my case. I'm intentionally quoting #.

    The "#" character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or "#" characters in the pattern (outside a character class, where they are unaffected by "/x"), then you'll either have to escape them (using backslashes or "\Q...\E") or encode them using octal or hex escapes.

    By the way hex escapes not working inside /\Q...\E/ for me. I worked around my problem replacing /\Q...#...\E/x with /\Q...\E\#\Q...\E/x.

    Thank you for your answer!