Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

using the following code

print "- " =~ /\s*\p{Dash}\s*/; # 1 print "- " =~ /\s*\p{Dash}{1}\s*/; # 2 print "- " =~ /\s*-\s*/; # 3
only line 2 and 3 matches. Debugging the regex makes clear why:
Compiling REx "\s*\p{Dash}\s*" synthetic stclass "ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash]". Final program: 1: STAR (3) 2: SPACE (0) 3: ANYOF[{unicode}+utf8::Dash] (15) 15: STAR (17) 16: SPACE (0) 17: END (0) stclass ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash] minlen 1 Matching REx "\s*\p{Dash}\s*" against "- " Matching stclass ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash] agains +t "- " (2 chars) 1 <-> < > | 1:STAR(3) SPACE can match 1 times out of 21474 +83647... 2 <- > <> | 3: ANYOF[{unicode}+utf8::Dash](15) failed... 1 <-> < > | 3: ANYOF[{unicode}\-...+utf8::Dash](1 +5) failed... failed... Contradicts stclass... [regexec_flags] Match failed Freeing REx: "\s*\p{Dash}\s*"

The first \s* matches the second character of "- " and the \p{Dash} fails, since the regex does not backtrack beyond the last space. But shouldn't there be a match?

Version 3 obviously matches because of some optimization (seraching for "-"), but why does Version 2 match, since it should be equivalent to 1?

Is this a bug or does my first regex simply not do what I think?

Cheers, Andrew

Replies are listed 'Best First'.
Re: Strange re behaviour for /\s*\p{Dash}\s*/ - bug or feature?
by JavaFan (Canon) on Aug 24, 2010 at 11:37 UTC
    Bug. An even simpler case:
    "-" =~ /\s*\p{Dash}/;
    This fails. Removing the \s* or adding {1} after the \p{Dash} makes it succeed.

    Please use perlbug to file a bug report.