The paired-down code attempts to split incoming lines on XX, but only when XX isn't inside a single-quoted string:
Now here's where it gets weird. The trailing fields are coming out one character at a time. Here's the output:#!perl while (<DATA>) { chomp; print "for [$_]:\n"; # split on XX, but only when it's not in # single quotes. while (m% ((?:[^']*? # Some unquoted stuff (?:'[^']*')? # optional quoted stuff )*?) # many times, but non-greedy XX | # field separator, or (.+) %xg) # a non-blank final field { # field value should now be in defined($2)?$2:$1 my ($t1, $t2) = qw(undef undef); $t1 = "[$1]" if defined($1); $t2 = "[$2]" if defined($2); print "\t$t1\t$t2\n"; } } __END__ aXXbXXc abcd a little 'quote XX' quote stuff XX other
for [aXXbXXc]: [a] undef [b] undef undef [c] for [abcd]: undef [a] undef [b] undef [c] undef [d] for [a little 'quote XX' quote stuff XX other]: [a little 'quote XX' quote stuff ] undef undef [ ] undef [o] undef [t] undef [h] undef [e] undef [r]
It's as though perl had decided that my last + sign in the regular expression should be non-greedy despite the fact that it's not followed by a ?.
What's going on here? It gets even more bizarre:
If I replace the first half with a pattern that really should be equivalent, this behavior goes away:
#!perl while (<DATA>) { chomp; print "for [$_]:\n"; # split on XX, but only when it's not in # single quotes. while (m% ([^']*? # Some unquoted stuff (?:'[^']*' # optionally, quoted followed by [^']*?)*? )# unquoted stuff. non-greedy XX | # field separator, or (.+) %xg) # a non-blank final field { # field value should now be in defined($2)?$2:$1 my ($t1, $t2) = qw(undef undef); $t1 = "[$1]" if defined($1); $t2 = "[$2]" if defined($2); print "\t$t1\t$t2\n"; } } __END__ aXXbXXc abcd a little 'quote XX' quote stuff XX other
To summarize: I have a regular expression match that is of the form m%(foo)XX|(.+)%g, where foo is a slightly complicated expression with no captures. When I run it, I get single character results repeatedly in $2. When I replace foo with a different complicated expression that should be equivalent, I suddenly get multiple characters in $2.
I've verified this behavior with cygwin's 5.8.6 perl and ActiveState's 5.6.1. (build 635)
-- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
In reply to This regex seems to have splattered non-greedy everywhere by fizbin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |