in reply to RE: RE: Primes
in thread Primes

I think primes is really cool! I would like to fully understand how it works and I have 2 questions about the second part of the regex: 1. why isn't (xx+) greedy? I mean why doesn't it match the whole string of x in $_? 2. I thought it was possible substitute \1 with $1: ^(xx+)$1+$, but this regex doesn't work correctly. why? TIA for any explanation. marcos

Replies are listed 'Best First'.
\1 Greed, $1
by kryten (Scribe) on Apr 10, 2000 at 14:55 UTC
    1. Why the (xx+) part does not match the entire string.
    Used on its own, it would. The thing is that when it is combined with \1+ in the regex: /(xx+)\1+/, it is now requred to match 2 or more x's followed by 1 or more of the same number of x's again. Which just doesn't work if (xx+) gobbles all the x's to start with! It still is greedy, in that 6x's will match as xxx and xxx rather than xx and 2 lots of xx, but it can't be TOO greedy or the match doesn't work.

    2. Why $1 is not exactly the same as \1
    As with most things in Perl, these two are kinda the same but not identical. $1 is actually what was matched in the LAST pattern match and \1 is a backreference which only works within the CURRENT pattern match. If you use \1 outside of a regexp you are likely to just get a reference to a constant '1'. Gramatical errors aside, see if this code makes anything clearer.
    #!/usr/local/bin/perl ($_ = 'of Perl Wisdom') =~ s/(.*)/Just Another Perl \1, /; print if /(\S+ )+of \1/; ($_ = 'Hacker') =~ s/(.*)/Just Another Perl \1, /; print if /(\S+ )+$1/; print "and Just another Perl ",\1;
    Note that in this case you could use $1 rather than \1 in the first and third lines but not in the second. And that you cannot use \1 instead of $1 in the fourth.

    An interesting? variaton.
    By removing the negation and printing out what matched (this time using $1) you can get the largest factor of a number. (or that many x's anyway) )
    perl -e 'while($l++<99){$_.=x;print $l,$1,$/if/^x$|^(xx+)\1+$/}'
    Or a half a christmas tree. :)
RE: RE: RE: RE: Primes
by btrott (Parson) on Apr 07, 2000 at 21:40 UTC
    1) The regex works because of backtracking. Perl's regular expressions *want* to match, and they keep trying until they've exhausted every possibility that will let them match.

    Once you understand that, you'll understand why it "doesn't" match the whole string in $_. I say "doesn't" because you're right, in a sense--the pattern *does* match the entire string. But then it moves on to the next part of the regex (the \1+) and sees that, if the first part matches the entire string, there's nothing left for the \1+ to match. So the regex wouldn't match.

    So the matcher moves back one x in the string, and tries to match again; and it keeps moving backwards (backtracking) through the string until it finds a match. If it doesn't find a match, the match fails, and you have a prime--because there is no pattern such that the pattern repeated a certain number of times matches the string; and since the string is x repeated N number of times, this means that there is no number such that the number times another number equals the original number. That sounded confusing--I suppose that's why symbolic notation was invented. :)

    For more on backtracking, look in perlre.

    2) \1 is used because \1 is used "inside" of a pattern (on the left-hand side of a substitution operator, or inside a matching operator); you use $1 "outside" of a pattern (on the right-hand side of a substitution operator, or outside a matching operator). More in perlre.