Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Dear monks,
using the following code
only line 2 and 3 matches. Debugging the regex makes clear why:print "- " =~ /\s*\p{Dash}\s*/; # 1 print "- " =~ /\s*\p{Dash}{1}\s*/; # 2 print "- " =~ /\s*-\s*/; # 3
Compiling REx "\s*\p{Dash}\s*" synthetic stclass "ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash]". Final program: 1: STAR (3) 2: SPACE (0) 3: ANYOF[{unicode}+utf8::Dash] (15) 15: STAR (17) 16: SPACE (0) 17: END (0) stclass ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash] minlen 1 Matching REx "\s*\p{Dash}\s*" against "- " Matching stclass ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash] agains +t "- " (2 chars) 1 <-> < > | 1:STAR(3) SPACE can match 1 times out of 21474 +83647... 2 <- > <> | 3: ANYOF[{unicode}+utf8::Dash](15) failed... 1 <-> < > | 3: ANYOF[{unicode}\-...+utf8::Dash](1 +5) failed... failed... Contradicts stclass... [regexec_flags] Match failed Freeing REx: "\s*\p{Dash}\s*"
The first \s* matches the second character of "- " and the \p{Dash} fails, since the regex does not backtrack beyond the last space. But shouldn't there be a match?
Version 3 obviously matches because of some optimization (seraching for "-"), but why does Version 2 match, since it should be equivalent to 1?
Is this a bug or does my first regex simply not do what I think?
Cheers, Andrew
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Strange re behaviour for /\s*\p{Dash}\s*/ - bug or feature?
by JavaFan (Canon) on Aug 24, 2010 at 11:37 UTC | |
by Anonymous Monk on Aug 25, 2010 at 14:39 UTC |