ykar has asked for the wisdom of the Perl Monks concerning the following question:

Please take a look at the following code:
$a = qr{\QHello# World\E}x; $b = qr{Hello\#\ World}x; $test = "Hello# World"; $test =~ $a && print "\$a matches \$test\n"; $test =~ $b && print "\$b matches \$test\n"; print "\$a = $a\n\$b = $b\n";
Output is following:
$b matches $test
$a = (?x-ism:Hello\#\ World\\E)
$b = (?x-ism:Hello\#\ World)
Why regular expressions $a and $b are not the same?

In perlre said that \Q and \E should quote white space and "#" character. It actually quotes the "#" character, but it quotes absolutely everything (even \E) after the "#" character.

Thank you very much for your attention!

Replies are listed 'Best First'.
Re: Weird quoting with /x modifier
by proceng (Scribe) on May 29, 2010 at 20:20 UTC
    I did the following:
    use strict; use warnings; use YAPE::Regex::Explain; my $regex1 = qr{\QHello# World\E}x; my $parser1 = YAPE::Regex::Explain->new($regex1)->explain; print "$parser1\n"; print "*" x 20; print "\n"; my $regex2 = qr{Hello\#\ World}x; my $parser2 = YAPE::Regex::Explain->new($regex2)->explain; print "$parser2\n";
    This returned:
    The regular expression: (?x-ims:Hello\#\ World\\E) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- Hello 'Hello' ---------------------------------------------------------------------- \# '#' ---------------------------------------------------------------------- \ ' ' ---------------------------------------------------------------------- World 'World' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- E 'E' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ******************** The regular expression: (?x-ims:Hello\#\ World) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- Hello 'Hello' ---------------------------------------------------------------------- \# '#' ---------------------------------------------------------------------- \ ' ' ---------------------------------------------------------------------- World 'World' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    Note the addition of the "\E" at the end of the first explanation, but not the second.

    Note also that the "\Q" is dropped by YAPE::Regex::Explain.

    The behaviour is constant across 5.8.9, 5.10.1 and 5.12.1 with YAPE::Regex::Explain 3.011

    HTH

    UPDATE:
    From perldoc perlre:

    (?#text)
    A comment. The text is ignored. If the /x modifier enables whitespace formatting, a simple # will suffice. Note that Perl closes the comment as soon as it sees a ), so there is no way to put a literal ) in the comment.
      Interesting addition:
      With the following code:
      use strict; use warnings; my $teststring = qq/Hello# World/; my $regex1 = qr{\QHello# World\E}x; my $regex2 = qr{Hello\#\ World}x; my $regex3 = qr{$teststring}x; my $regex4 = qr{\Q$teststring\E}x; print "String is $teststring\n"; print "Regex1 is $regex1\n"; print "Regex2 is $regex2\n"; print "Regex3 is $regex3\n"; print "Regex4 is $regex4\n";
      I get the following output (perl 5.12.1):
      String is Hello# World Regex1 is (?x-ism:Hello\#\ World\\E) Regex2 is (?x-ism:Hello\#\ World) Regex3 is (?x-ism:Hello# World ) Regex4 is (?x-ism:Hello\#\ World)
      For regex3, the newline is inserted by the qr directive, as verified by a hex dump of the file.

      UPDATE:
      Output from perl 5.8.9:

      $ perl8 regex.pl String is Hello# World Regex1 is (?x-ism:Hello\#\ World\\E ) Regex2 is (?x-ism:Hello\#\ World ) Regex3 is (?x-ism:Hello# World ) Regex4 is (?x-ism:Hello\#\ World )
      Output from perl 5.10.1:
      $ perl10 regex.pl String is Hello# World Regex1 is (?x-ism:Hello\#\ World\\E) Regex2 is (?x-ism:Hello\#\ World) Regex3 is (?x-ism:Hello# World ) Regex4 is (?x-ism:Hello\#\ World)
      So, it looks like there may be an edge case that was partially fixed.

      UPDATE 2: clarified that initial tests were on 5.12.1
      This is perl 5, version 12, subversion 1 (v5.12.1) built for x86_64-linux-thread-multi-ld
      This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi-ld
      This is perl, v5.8.9 built for x86_64-linux-thread-multi-ld

      >> Note the addition of the "\E" at the end of the first explanation, but not the second.

      This is exactly what bothering me. As far as I understand \Q and \E is just syntactic sugar, to avoid typing dozens of escapes. So neither \Q nor \E should appear in compiled regular expression.

      Regarding your excerpt from perlre: it is not my case. I'm intentionally quoting #.

      The "#" character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or "#" characters in the pattern (outside a character class, where they are unaffected by "/x"), then you'll either have to escape them (using backslashes or "\Q...\E") or encode them using octal or hex escapes.

      By the way hex escapes not working inside /\Q...\E/ for me. I worked around my problem replacing /\Q...#...\E/x with /\Q...\E\#\Q...\E/x.

      Thank you for your answer!

Re: Weird quoting with /x modifier
by choroba (Cardinal) on May 29, 2010 at 18:37 UTC
    I get the same results with perl 5.10.0. In perl 5.8.8, the results are different:
    $b matches $test $a = (?x-ism:Hello\#\ World\\E ) $b = (?x-ism:Hello\#\ World )
    Seems like a bug to me. What is the code's behaviour in 5.12?

      If nobody explained such behavior yet, maybe it turns out to be a bug.

      Despite different representation of compiled regular expression in 5.8.8 (the only difference is newline), the problem is the same: \E is quoted instead of been interpreted as closing end of \Q.

      Thank you for your answer!

Re: Weird quoting with /x modifier
by ikegami (Patriarch) on May 31, 2010 at 03:42 UTC
    "#" (like "$" and "@") apparently has higher precedence than "\Q". You assumed the opposite.

      I don't think so.

      It is said that # should be backslashed or quoted with \Q..\E in perlre. So if # actually have higher precedence in regexp with /x modifier it is a bug, because it does not work as stated.

      Second. If # had higher precedence than \Q..\E, string was just trimmed after the # character.

        So if # actually have higher precedence in regexp with /x modifier it is a bug, because it does not work as stated.

        True. You can use perlbug to file a report.

        If # had higher precedence than \Q..\E, string was just trimmed after the # character.

        False. "#" doesn't trim anything.

        >perl -e"print qr/4d#sd/x" (?x-ism:4d#sd )
Re: Weird quoting with /x modifier
by QM (Parson) on Nov 01, 2013 at 10:19 UTC
    Just stumbled across this.

    Has it been resolved? That is, what is the definitive expected behavior/precedence wrt/ \Q\E and #?

    Does the documentation need clarifying?

    Update:

    Just reran the example above on 5.10.1, still get the \E in the output:

    use YAPE::Regex::Explain; $regex1 = qr{\QHello# World\E}x; $parser1 = YAPE::Regex::Explain->new($regex1)->explain; print "$parser1\n"

    produces:

    The regular expression: (?x-ims:Hello\#\ World\\E) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- Hello 'Hello' ---------------------------------------------------------------------- \# '#' ---------------------------------------------------------------------- \ ' ' ---------------------------------------------------------------------- World 'World' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- E 'E' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    And perlre for 5.10.1 says:

    The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x), then you'll either have to escape them (using backslashes or \Q...\E ) or encode them using octal or hex escapes.
    The 5.18 version is essentially the same.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of