schnarff has asked for the wisdom of the Perl Monks concerning the following question:

This should be a relatively simple question, but I've been completely unable to find an answer after much searching, so I apologize if the answer is obvious.

If I have a PCRE with a subpattern, for example:
/(\d\s+foo)/
I know that I can repeat that subpattern easily with reptition quantifiers, i.e.
/(\d\s+foo){1,3}/
and that I can force a repeat of what had been matched, even after intervening characters, like this:
/(\d\s+foo)[^\r\n]*\1/
however, I don't know if it's possible to repeat the conditions imposed by the subpattern (i.e. have a second occurance of the subpattern without spelling it all the way out). Is there a way of taking, say,
/(\d\s+foo)[^\r\n]*(\d\s+foo)/
and representing the second, identical subpattern with some symbol/metacharacter? I know that, in terms of pure functionality, such a distinction would be irrelevant, but frankly the redundancy wounds my sense of Perl beauty, and thus I'd prefer a cleaner, nicer way to do this. :-)

Also, if such a method does exist, is it equally valid for named subpatterns?

Thank You,
Alex Kirk

Replies are listed 'Best First'.
Re: PCRE: Repeating Subpattens After Intervening Characters
by xdg (Monsignor) on Sep 14, 2005 at 22:04 UTC

    merlyn's on-target comment aside, I'd normally do that by assigning the repeated pattern to a variable and interpolating that into the regex pattern.

    # original: my $pat = "\d\s+foo"; # fixed: my $pat = "\\d\\s+foo"; my $regex = qr/$pat[^\r\n]*$pat/;

    Not sure about PCRE, but whatever programming language you use, I suspect you can construct a string using variables similarly to build up to your pattern.

    Update: Doh! Thanks, ikegami. I was trying to show the subpattern with strings for the cross-language analogy and slipped up.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      That should be
      my $pat = "\\d\\s+foo";
      or better yet
      my $pat = qr/\d\s+foo/;

      Your pat incorrectly matches "dssssfoo", and doesn't match "1 foo" as it should.

      my $pat = "\d\s+foo"; print("$pat\n"); # ds+foo
      Wow, you guys are amazingly quick to reply and helpful in those replies. That said, since I need to do this outside of Perl, in a pure PCRE expression, neither this nor ikegami's post is helpful.

      Seeing as how I just looked and found no PCREmonks, however, does anyone know if such a forum actually exists and I'm just bad at Googling, or if a forum with such a purpose exists under some different name/URL?

      Thanks Again,
      Alex
        Wow, you guys are amazingly quick to reply and helpful in those replies. That said, since I need to do this outside of Perl, in a pure PCRE expression, neither this nor ikegami's post is helpful.

        I disagree. We didn't use anything Perl-specific. For example, here it is in JScript:

        var pat = "\\d\\s+foo"; var regexp = "(" + pat + ")[^\r\n]*(" + pat + ")"; var re = new RegExp(regexp); re.exec(some_string);

        (Untested)

Re: PCRE: Repeating Subpattens After Intervening Characters
by merlyn (Sage) on Sep 14, 2005 at 21:50 UTC
    Oops. You're looking for PCREmonks, not Perlmonks. There are definitely Perl answers to your question, but PCRE isn't Perl, and Perl isn't PCRE, so no answers will be directly applicable.

    At least you were nice enough to announce this as a PCRE problem. We've had past inquisitors ask about PHP and Java regexen without prior disclosure, which only leads to broken useless answers because we (sensibly) presume a Perl environment.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Oops. You're looking for PCREmonks, not Perlmonks.
      Darn. I knew it was too easy to reply with:
      my $regex = qr/(\d\s+foo)/; $some_string =~ /$regex[^\r\n]*$regex/;

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        That doesn't capture properly. Fix:
        my $regex = qr/\d\s+foo/; $some_string =~ /($regex)[^\r\n]*($regex)/;

        Also, your solution doesn't scale (if capturing is desired). He can't say, for example, the following while capturing what $regexp matches:

        my $regex = qr/\d\s+foo/; $some_string =~ /(?:$regex[^\r\n]*)*($regex)/;
        Oh...I didn't realize PCREmonks existed. I'll go ask them. Thanks for being friendly in the meantime. :-)

        Alex
Re: PCRE: Repeating Subpattens After Intervening Characters
by Hue-Bond (Priest) on Sep 14, 2005 at 22:31 UTC
    Is there a way of taking, say,
    /(\d\s+foo)[^\r\n]*(\d\s+foo)/

    What about:

    my $re = qr/\d\s+foo/; sub match { local $_ = shift; /(\d\s+foo)[^\r\n]*\1/x and print "1\n"; /\d\s+foo [^\r\n]*\d\s+foo/x and print "2\n"; /($re) [^\r\n]*\1/x and print "3\n"; /$re [^\r\n]*$re/x and print "4\n"; } match '4 foo4 foo'; ## easily matched match '4 foo5 foo'; ## a bit trickier __OUTPUT__ 1 2 3 4 2 4

    We can see that all 4 regexps matched the first text but in order to match the second, we can't use the backreference and must resort to other things. I think isolating the common part and using it where necessary ("4") is best.

    --
    David Serrano

Re: PCRE: Repeating Subpattens After Intervening Characters
by ikegami (Patriarch) on Sep 14, 2005 at 21:59 UTC

    Taking advantage of specific features of your regexp,

    @matches = map { /^(\d\s+foo)/; $1 } /(?:\d\s+foo[^\r\n]*)/g;

    This will validate and return matches. If you just wish to validate, the following will suffice:

    # Specific case: /(?:\d\s+foo[^\r\n]*)/; # General case: $repeated = qr/\d\s+foo/; /(?:$repeated[^\r\n]*)$repeated/;

    Update: My capture version has two issues. And the pattern is still repeated.

    You really do want a parser for this:

    list : term list_ { [ $item[0], $item[1], @{$item[2]} ] } list_ : sep term { [ $item[1], @{$item[2]} ] } | { [ ] }