Re: Bug or feature? s/// and the g option
by dsheroh (Monsignor) on Oct 14, 2007 at 15:39 UTC
|
I'd guess that it's undefined because, in the final example, the last partial match (B...) fails and apparently clears $1 before doing so. Reversing the order of those lines works without returning the warning:
$test = <<'END';
ABFG
ABCD
END
Edit: I poked at it a little more and seem to have confirmed my theory. If the final B in the data string is not followed by C?[DE], $1 ends up undef, regardless of where that B is. | [reply] [d/l] [select] |
|
|
Interesting. So a failed match will clear the $1 variable.
I'm still a little stumped by Marco's original example:
$test = <<'END';
XXWY
XXWZ
END
$count = $test =~ s#(XXW?Y)##gi;
print "REMOVED: <<$1>>\nCOUNT: $count\n";
That fails. But remove the "i" in the substitution and $1 is defined. Why would case-insensitive make any difference?
--rjk | [reply] [d/l] |
|
|
AIUI, without the i, before the regex engine proper is entered, there's an optimization that makes it search for an XX followed later by a Y. That optimization rejects a match, and when that happens, it bypasses the bug that's setting $1 to undef. Seems to be fixed for 5.10.0.
| [reply] |
|
|
Re: Bug or feature? s/// and the g option
by graff (Chancellor) on Oct 14, 2007 at 16:17 UTC
|
Personally, I wouldn't consider it a bug, but rather a constraint on the use of capturing parens and references to captures in the context of the "g" modifier: the "$1,$2,..." can only be used reliably in the replacement side of s///g, and cannot be counted on as defined outside the scope of that operator. | [reply] [d/l] |
|
|
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Bug or feature? s/// and the g option
by ysth (Canon) on Oct 14, 2007 at 20:07 UTC
|
To those who call this "undefined behavior":
Just because it's not a very useful thing to do, doesn't mean it's not a bug. $1, etc., shouldn't be affected by unsuccessful matches. This is clearly spelled out in $& (which suffers the same bug as $1) even if $<_digits_> is for some reason missing the "successful" qualification. | [reply] |
|
|
Especially don't call it a feature.
Some people might try to categorize this as hidden feature. My thought: although not stated itself, hidden feature cannot violate what's stated, i.e. must be logically consistent with the rest.
The best way to verify whether something is a feature, obviously is to ask the creators, but if that's not feasible, my above logic applies.
| [reply] |
Re: Bug or feature? s/// and the g option
by Krambambuli (Curate) on Oct 14, 2007 at 16:57 UTC
|
I'm curious what the RE gurus looking into the monastery will say. However, so far, it seems to me to be neither a bug nor a feature, but just an oddity that comes from an somewhat unfortunate [mis]use of /g.
As I understand it, /g is not a substitute for /sm - it is sort of an iterator that lets you steadily step through the matches in a string, if you need a step-by-step, match-by-match approach. See it in conjunction with pos.
Successive, iterated substitution - which /g seem to imply, are clearly weird: the intermediate string resulting after every substitution is something different from both the initial string as from the final result, and there is by no means any intent to use it for anything other then as 'something' unfinished.
Think about something like
my $test = 'AAAA';
my $x1 = $1 if $test =~ s/A/AA/g;
my $x2 = $1 if $test =~ s/A/AA/g;
...
Would you expect any intermediate results ?! I think not, and so I wouldn't expect anything from $1,$2,... after a s//g, similar to like I don't really trust for example a for-loop control variable to be something I can rely on once the loop has finished.
I remain curious about what others think/know about it.
| [reply] [d/l] |
|
|
As I understand it, /g is not a substitute for /sm
print is not a substitute for system. True, but obvious.
In fact, not only are they not equivalent, s, m and g are orthgonal.
s affects what . matches.
m affects what ^ and $ matches.
g affects the number of substitutions that will be made.
/g is sort of an iterator that lets you steadily step the matches in a string
No. You're thinking of m//g in void and scalar context. That's neither the case for m//g in list context nor for s///g.
my $x1 = $1 if $test =~ s/A/AA/g;
Off-topic, but my $var if ... is wrong. my has a run-time effect, so it shouldn't be executed conditionally.
| [reply] [d/l] [select] |
|
|
Thank you, ikegami - all objections gratefully accepted.
Turning back to the original question: so _should_ $1 contain something you could normally rely upon after a successfull s///g ?
| [reply] |
Re: Bug or feature? s/// and the g option
by rowdog (Curate) on Oct 15, 2007 at 08:46 UTC
|
In my mind, this is a documentation bug. If you match instead of substitute, you'll get the expected behavior. Also, the substitution does, in fact, substitute exactly what you ask it to.
I'm grateful that this question led me to spend a couple hours re-reading perlre et al but I never did find anything that explained this behavior. This could easily be an oversight on my part.
To return to the original question, I don't think this is either a bug or a feature but, rather, a design decision: when replacing multiple times, should $1 contain the value of the last successful match or should it be undef to reflect the fact that the last match failed? I wouldn't presume to say that the wrong choice was made but it's obvious what that choice was.
| [reply] |