John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

Kanji suggested using enclosing angles for the pseudo-sigil to represent a file handle. I like that idea, since it's clear and obvious.

But it's less than elegant in the code. Everything has a prefix char, and here's one that has a postfix as well.

Consider how to simplify this regex:

/(^([$@*%]))|(^<.*>$)/
That is, what I really want is
/^([$@*%<)(.*)(>?)$/
with the stipulation that the $3 is present only if $1 was <, and absent otherwise.

I can certainly do it in a line or 3, especially with lots of comments for clarity. But I'm wondering if there is a really cool way to do it in one succinct bite (wishing for grammars like Perl6 here...).

On a more meditative note, I see that a more powerful pattern engine can make things clearer simply because you can, in analogy with English, sum up your selection with a simple statement of intent, rather than groping for an adjective but having to speak at length about this and that special case.

I see this in documentation, too. If the rule is simple it not only makes the code simple, but makes the documentation easy too.

—John

Replies are listed 'Best First'.
Re: conditional match in regex
by tachyon (Chancellor) on Nov 04, 2002 at 21:45 UTC
    /^([$@*%])(.*)$/ or /^(<)(.*)(>)$/

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: conditional match in regex
by jryan (Vicar) on Nov 04, 2002 at 23:00 UTC

    Well, here's how I would do it; this regex will populate $1 with the sigil, and $2 with the name if it is a variable like "$foo"; in the "<foo>" case, $1 will be undef and $2 will contain the name.

    / ^ # start (?:([\$\@\*\%])|<) # leading sigil or < (\w+) # name (?(?{$1}) |> ) # if there was a leading sigil, match nothing; # otherwise, match > $ # end /x;

    Another way might be to use this one:

    / ^ # start ([\$\@\*\%<]) # sigil (\w+) # name (?(?{$1 eq '<'}) >| ) # if the sigil is a '<', match the end; # otherwise match nothing $ # end /x;

    This is different in that in the "<foo>" case, $1 will be '<'.

    In either case, theres no real reason to capture $3, since there is only one possibility that it could be(>), and you will know if the possibility is true or not depending on $1.

      This also seems to work (at least in perl 5.6.1):
      / ^ ( [\$%@*] | (<) ) ( .* ) (? (2) > ) # ( (2) stands for $2 (the (<) above)) $ /x;


        p
        Yes, that will work, but yours has the problem that it creates $1, $2, and $3. I wanted to limit the regular expression so that $1 = type, and $2 = name.
      Thanks, that example of using code in a regex is exactly what I was wondering.

      The perlre page states that (?{ code }) is always successful, but also says that it may be used in a conditional match.

      So I'm guessing that if used alone, the code has side-effects only and always succeeds. But if used as the condition of a (?(condition)yes-pattern[|no-pattern]), then it does indeed use the result as the condition.

        Yep, you got it.