monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:

Fellows
I was trying to capture some different patterns using some large and weird regular expressions, and came across a funny problem. I need to capture a single piece of the code to a variable, but I have a lot of capture-enabled parentesis across the entire regular expression. So I tryied something like this:

my ( $piece ) = $strange_thing =~ m/($large)|$wei($rd)|($stra)$nge/;

I know for sure that when something matches, I will get it on the ${0} - ${\d+} variables, but what is happen is: when things matches with ($large), all goes fine, and $piece contains what I need; but when $strange_thing matches any other pattern, $piece ends with nothing inside. I'm doing something really stupid, I know, but I can't see what. How can I use the same cage to capture many different birds, one at a time?

Thank you all and may the gods bless you all.


"In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greather, the parties better and the life easier." -- monsieur_champs

Replies are listed 'Best First'.
Re: Many birds, single cage...
by Enlil (Parson) on Dec 09, 2003 at 20:34 UTC
    Here's one way:
    my $piece = $+ if $strange_thing =~ m/($large)|$wei($rd)|($stra)$nge/;
    Have a look at perldoc perlvar for more info on $+

    -enlil

      Dear Enlil
      Thank you very much for remember this special variable.
      It seems that this is the solution that I was looking for. I 'm just a little concerned about performance, but I will run a full test set on the new implementation tomorrow (it's time to go home). Can you please tell me in advance if this can bite me on performance matters?

      May the gods bless you.


      "In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greather, the parties better and the life easier." -- monsieur_champs

      If I had any more votes to give today, I'd ++ you right now. I'd forgotten about $+. Good catch.

Re: Many birds, single cage...
by BrowserUk (Patriarch) on Dec 09, 2003 at 22:02 UTC

    The best (for some definition of that term) solution I have found for the problem of having variable numbers of caturing parens that may or may not be part of specific matches, is to use $^N, some our variables and embedded (?{code}) blocks.

    our( $foo, $bar, $qux); $string =~ m[ ( some regex ) (?{ $foo = $^N }) other stuff ( a conditional capture ) (?{ $bar = $^N }) | ( another possibility ) (?{ $bar = $^N }) ... ( a final piece ) (?{ $qux = $^N }) ]x;

    Using this method, you know that when the regex engine is finished, exactly where it has put whatever it has captured and you no longer have to play musical chairs trying to work it out.

    With care, it is extensible to extremely complex regexes with essentially any number of captures. You can also arrange that multiple captures resulting from repeating regexes be pushed onto an array. This is something that I know of no other usable way of doing.

    There is a performance penalty with this technique, and I'm not keen on the use of globals, but until we get capture-to-named-vars (which seems to be a part of P6 grammers), it is the least bad of a set of bad alternatives.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

Re: Many birds, single cage...
by duff (Parson) on Dec 09, 2003 at 20:35 UTC

    The first set of parens goes into $1, and the second (($rd) in this case) goes into $2. If you just want only one of the parenthesized bits, then you have to explicitly check $1, $2, and $3 yourself. Something like this:

    $strange_thing =~ m/($large)|$wei($rd)|($stra)$nge/; my $piece = $1 || $2 || $3;

    Hope this helps!

      It would be better to see if the variable is defined rather than if it is true (via || )as 0 is not true and could very well be the pattern that matched.

      -enlil

      Dear duff
      Thank you, but I have near 50 capture parentesis on my real regular expression. It is a complete grammar for my company's application command-line parser. This was the first solution I tried and rejected, because it's unacceptable (and what if I need one more capture parentesis?).

      Thank you for the effort and for care about answering, anyway. This for sure will be usefull for another monk, in the future.


      "In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greather, the parties better and the life easier." -- monsieur_champs

        If it's a complete grammar, then somethign like Parse::RecDescent or another parser generator may be a good bet for future development.

        You can do it all with regular expressions and captures, but I'm not sure why you would want to.

        If a proper parsing module is not an option, you could match one token at a time in a loop. From there you could build a recursive descent parser of your own or somesuch.



        Christopher E. Stith
Re: Many birds, single cage...
by delirium (Chaplain) on Dec 09, 2003 at 20:44 UTC
    How about:

    my ( $piece ) = $strange_thing =~ m/($large|(?<=$wei)$rd|$stra(?=$nge))/;

    The lookbehind and lookahead assertions won't become part of $1.