1. Why the (xx+) part does not match the entire string.
Used on its own, it would. The thing is that when it is combined with \1+ in the regex: /(xx+)\1+/,
it is now requred to match 2 or more x's followed by 1 or more of the same number of x's again.
Which just doesn't work if (xx+) gobbles all the x's to start with!
It still is greedy, in that 6x's will match as xxx and xxx rather than xx and 2 lots of xx, but it can't be TOO greedy or the match doesn't work.
2. Why $1 is not exactly the same as \1
As with most things in Perl, these two are kinda the same but not identical.
$1 is actually what was matched in the LAST pattern match and \1 is a backreference which only works within the CURRENT pattern match.
If you use \1 outside of a regexp you are likely to just get a reference to a constant '1'.
Gramatical errors aside, see if this code makes anything clearer.
#!/usr/local/bin/perl
($_ = 'of Perl Wisdom') =~ s/(.*)/Just Another Perl \1, /;
print if /(\S+ )+of \1/;
($_ = 'Hacker') =~ s/(.*)/Just Another Perl \1, /;
print if /(\S+ )+$1/;
print "and Just another Perl ",\1;
Note that in this case you could use $1 rather than \1 in the first and third lines but not in the second. And that you cannot use \1 instead of $1 in the fourth.
An interesting? variaton.
By removing the negation and printing out what matched (this time using $1) you can get the
largest factor of a number. (or that many x's anyway) )
perl -e 'while($l++<99){$_.=x;print $l,$1,$/if/^x$|^(xx+)\1+$/}'
Or a half a christmas tree. :)
| [reply] [d/l] [select] |
1) The regex works because of backtracking. Perl's regular
expressions *want* to match, and they keep trying until
they've exhausted every possibility that will let them
match.
Once you understand
that, you'll understand why it "doesn't" match the whole
string in $_. I say "doesn't" because you're right, in a
sense--the pattern *does* match the entire string. But
then it moves on to the next part of the regex (the \1+)
and sees that, if the first part matches the entire string,
there's nothing left for the \1+ to match. So the regex
wouldn't match.
So the matcher moves back one x in the
string, and tries to match again; and it keeps moving
backwards (backtracking) through the string until it
finds a match. If it doesn't find a match, the match
fails, and you have a prime--because there is no pattern
such that the pattern repeated a certain number of times
matches the string; and since the string is x repeated
N number of times, this means that there is no number such
that the number times another number equals the original
number. That sounded confusing--I suppose that's why symbolic
notation was invented. :)
For more on backtracking, look in perlre.
2) \1 is used because \1 is used "inside" of a pattern
(on the left-hand side of a substitution operator, or
inside a matching operator); you use $1 "outside" of a
pattern (on the right-hand side of a substitution operator,
or outside a matching operator). More in perlre. | [reply] |