Re: Bizarreness in ?PATTERN? and g
by bmann (Priest) on Jun 04, 2004 at 05:01 UTC
|
It's by design. I did the same thing once.
From perlop:
?PATTERN?
This is just like the /pattern/ search, except that it matches
only once between calls to the reset() operator. This is a
useful optimization when you want to see only the first
occurrence of something in each file of a set of files, for
instance. Only ?? patterns local to the current package are
reset.
while (<>) {
if (?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear ?? status for next file
}
This usage is vaguely deprecated, which means it just might
possibly be removed in some distant future version of Perl,
perhaps somewhere around the year 2168.
The solution? Use a different delimiter, but I'm sure you already know that ;)
| [reply] [d/l] |
|
|
No no no, you're missing my point. I realize that ?? only matches once, but shouldn't the second regex be seperate from the first regex?
/regex1/
/regex2/
Regex1 and 2 shouldn't have anything to do with each other, should they? But in my example, the two ??'s are affecting each other, but they're completely seperate! | [reply] [d/l] |
|
|
$_="a1 a2 a3";
while(?a(\d)?){ print $1; }
while(?a(\d)?){ print $1; }
__END__
11
Updated to add in the snippet. | [reply] [d/l] |
|
|
|
|
|
|
Re: Bizarreness in ?PATTERN? and g
by BrowserUk (Patriarch) on Jun 04, 2004 at 05:00 UTC
|
perl> $_="a1 a2 a3";
if( /a(\d)/g ){ print $1, ' ', pos( $_ ); }
if( /a(\d)/g ){ print $1, ' ', pos( $_ ); }
1 2
2 5
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] [select] |
|
|
Well yeah, because the ??'s only return true. But surely when you repeat the code, it should be a completely seperate regex. Why should it's match be determined by the first regex?
| [reply] |
|
|
| [reply] [d/l] |
|
|
Re: Bizarreness in ?PATTERN? and g
by Mr. Muskrat (Canon) on Jun 04, 2004 at 05:04 UTC
|
That is not weird at all. Perlop says this:
?PATTERN?
This is just like the /pattern/ search, except that it matches only once between calls to the reset() operator.
The reason the second snippet works is because you have two separate instances of the same pattern match. The g allows you to continue with the next match (whether that is due to a reset or a separate pattern match). Observe:
$_="a1 a2 a3";
while(?a(\d)?g){
print $1;
reset;
}
__DATA__
123
If you remove that g, you end up with an infinite loops of 1's. Remove the reset and the g and you are right back to the output being a single, solitary 1.
| [reply] [d/l] |
Re: Bizarreness in ?PATTERN? and g
by beth (Scribe) on Jun 04, 2004 at 05:11 UTC
|
Buu's code doesn't seem so bizarre - as many people have commented, it kinda makes sense.
But compare this:
$s="a1 a2 a3";
for (1..2) {
print $1 if $s =~ ?a(\d+)?g;
}
with this:
$s="a1 a2 a3";
print $1 if $s =~ ?a(\d+)?g;
print $1 if $s =~ ?a(\d+)?g;
I would expect these to be equivalent ... but no, the first example prints "1" and the second prints "1 2".
The plot thickens!
update 2004-06-04 01:41 - added conditionals so the first snippet prints "1" rather than "1 1".
--
eval pack("H*", "7072696e74207061636b2822482a222c202236613631373036382229");
# japh or forkbomb? You decide!
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
So, hmm. It seems ?? only acts differently if it's inside a loop. The perlop docs mention reset, and reset is only really useful for loops, so perhaps ?? is too? Hopefully someone who knows will come clear this up.
| [reply] [d/l] [select] |
|
|
Each occurence of ?? in code will match only once beteen resets. If you have more than one ??, each one can match once.
| [reply] |
|
|
japhy took a look at this but I guess he didn't post about it yet. Basically, loops and repeated statements are not interchangeable - they're compiled differently. He suggested running the following two tests, which are quite enlightening:
perl -MO=Terse -e '/x/; /x/;'
perl -MO=Terse -e '/x/ for 1, 2'
--
eval pack("H*", "7072696e74207061636b2822482a222c202236613631373036382229");
# japh or forkbomb? You decide!
| [reply] [d/l] |
Re: Bizarreness in ?PATTERN? and g
by Anonymous Monk on Jun 04, 2004 at 04:58 UTC
|
perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~?a(\d+)?g;
+print $1; }'
prints 1, 10 times
perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~/a(\d+)/g;
+print $1; }'
prints 1 3 5 11 11 1 3 5 11 11
vs.
perl -le '$s="a1, a3 - a5\na11"; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?
+g; print $1; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1;'
which prints 1 3 5 11
so the CODE ??g repeated in the SOURCE seems to behave differently then while (1..4) { ??g }, ie. the CODE ??g EXECUTED multiple times in the code.
I am confused.
Edited by Chady -- added code tags. | [reply] [d/l] [select] |
Re: Bizarreness in ?PATTERN? and g
by CountZero (Bishop) on Jun 04, 2004 at 05:55 UTC
|
$value1="a1 a2 a3 a4";
$value2=$value1;
while($value1=~/a(\d)/g){ print $1; last; }
while($value2=~/a(\d)/g){ print $1; }
does indeed output 11234 so it has everything to do with the fact that it is not the regex, but rather the variable your are trying to match is the same or not.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] |
Re: Bizarreness in ?PATTERN? and g
by Roy Johnson (Monsignor) on Jun 04, 2004 at 13:43 UTC
|
The //g modifier is tied to pos(), which is tied to the scalar being matched against. All matches on the same scalar with the g option read and set the same pos().
The ?? is more like a flip-flop, where the counter is attached to the expression itself, so separate ?? expressions each match once, even if they use the same pattern and/or match against the same scalar.
The PerlMonk tr/// Advocate
| [reply] |
Re: Bizarreness in ?PATTERN? and g
by Anonymous Monk on Jun 04, 2004 at 05:06 UTC
|
Sorry about the previous post, didn't read the FAQ
here it is PROPERLY FORMATTED
perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~?a(\d+)?g; print $1; }'
prints 1, 10 times
perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~/a(\d+)/g; print $1; }'
prints 1 3 5 11 11 1 3 5 11 11
vs.
perl -le '$s="a1, a3 - a5\na11"; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1;'
which prints 1 3 5 11
so the CODE ??g repeated in the SOURCE seems to behave differently then while (1..4) { ??g }, ie. the CODE ??g EXECUTED multiple times in the code.
I am confused. | [reply] [d/l] [select] |
Re: Bizarreness in ?PATTERN? and g
by integral (Hermit) on Jun 04, 2004 at 15:37 UTC
|
As has already been said a match in scalar context with /g doesn't 'reset' pos() so that the next match with /g on that variable will start from that position.
The key error in your understanding is that "they're two completely separate regexen". They're not because they're connected by pos() since they have /g.
In first examples using ?? work just the same as the latter case with m// with the once-only nature of ?? just being an additional constraint. ?? works exactly like m// with regards to setting pos() and starting from pos() in scalar context with /g.
So the key point is that the position of the last match is being carried over between the statements, which you weren't expecting.
--
integral, resident of freenode's #perl
| [reply] |
Re: Bizarreness in ?PATTERN? and g
by Anonymous Monk on Jun 04, 2004 at 18:30 UTC
|
Hi all. japhy speaking. Here's the run-down:
- The regex variables are not reset or changed after a failed regex.
- The /g modifier, in scalar context, tells the regex to match once,
and then update pos($str) to wherever the regex ended in the string. The
next time that string is matched against by a regex with the /g flag, the
regex will start looking NO SOONER than pos($str) in the string. pos()
is not tied to a regex, it's tied to a string.
- A m?? regex matches only once (in between calls to reset()). That
behavior is tied to THAT specific regex.
- Perl is compiled. That means /x/ for 1, 2; is different
from /x/; /x/;.
Put it all together and you have this fact:
# code 1
$str = "abc";
for (1, 2, 3) {
print $1 if $str =~ ?(.)?g;
}
# code 2
$str = "abc";
print $1 if $str =~ ?(.)?g;
print $1 if $str =~ ?(.)?g;
print $1 if $str =~ ?(.)?g;
The first code only prints 'a'. The second code prints 'abc'. This is
because the first code has only one PMOP (Perl's internal representation of
a pattern match operation), whereas the second code has THREE of them. Each
PMOP has its own flags, such as the "I'm a m?? regex" flag.
Now for a bit of fun. What does this code print?
$str = "abc";
for (1, 2, 3) {
$str =~ ?(.)?g;
print $1;
}
Does it print "a" (and then two empty strings)? No. Why not? Because the
regex variables ($1, et. al.) are in a *slightly* larger scope than you'd
expect: they retain their values for the duration of that for loop. It would
be similar to saying:
$str = "abc";
{
local ($_1, $_2, ...);
for (1, 2, 3) {
$str =~ ?(.)?g and ($_1, $_2, ...) = ($1, $2, ...);
print $_1;
}
}
except, of course, that you don't have to.
| [reply] [d/l] [select] |