WarrenBullockIII has asked for the wisdom of the Perl Monks concerning the following question:

I am having a little trouble understanding regular expressions. My question is: How do you know when it is a good time to use parenthesis in a pattern. I know that if I would like to store a value in memory than I could use them. In an exercise in the Learning Perl book it says: Make a pattern that matches only lines containing either the word fred or wilma, followed by some whitespace, and then the word flintstone. So it should match the string: I am fred flintstone(with one or more spaces or tabs between the names.) Well my idea of what the code should look like is this:
/\bfred|wilma\s+flintstone\b/
However, the book lists the code like this:
/\b(fred|wilma)\s+flintstone\b/
I do not understand why the parenthesis are needed here. But I do see that if I take them away it causes problems because if I type just flintstone at the prompt it will not match. However, if I leave the parenthesis in the pattern as the book does than flintstone will match... could someone please clarify this for me..

-Warren E Bullock III wbullock@twcny.rr.com

Edit kudra, 2002-06-08 Changed title, added code tags

Replies are listed 'Best First'.
•Re: Regular Expressions
by merlyn (Sage) on Jun 08, 2002 at 03:06 UTC
    You use parens in a regular expression in the same way you need parens in (2+3)*4 in math... it's to handle precedence rules when they are not in the order you need.

    -- Randal L. Schwartz, Perl hacker

Re: Regular Expressions
by mephit (Scribe) on Jun 08, 2002 at 03:39 UTC
    Without the parens, the alternation will appear to be "greedy." The perlre man page says:
    You can specify a series of alternatives for a pattern using "|" to separate them, so that `fee|fie|foe' will match any of "fee", "fie", or "foe" in the target string (as would `f(e|i|o)e'). The first alternative includes everything from the last pattern delimiter ("(", "[", or the beginning of the pattern) up to the first "|", and the last alternative contains everything from the last "|" to the next pattern delimiter. That's why it's common prac­tice to include alternatives in parentheses: to minimize confusion about where they start and end.
    Ie., without the parens, it's like checking for either /\bfred/ or /wilma\s+flintstone\b/, which isn't What You Want, and the string "I am fred" will match successfully, even though there's no "flintstone" to be found anywhere in the string.

    In short, use parens in an alternating pattern to avoid confusion. HTH

    --

    There are 10 kinds of people -- those that understand binary, and those that don't.

Re: Regular Expressions
by BUU (Prior) on Jun 08, 2002 at 03:19 UTC
    a side effect of which is that they capture the value for backreferencing
      If you want grouping without capturing, use (?:...) instead of the (...) pattern.

      In list context, you can get away with unwanted captures by using undef on the left hand side.

       my( $a, undef, $b) = m/(a)(b)(c)/;

      -- stefp -- check out TeXmacs wiki

Re: Parenthesis usage in Regular Expressions
by Molt (Chaplain) on Jun 08, 2002 at 21:44 UTC

    It's also worth noting that some of us have the habit of using (?: ... ) for parentheses which aren't actually going to capture anything into $1, $2 etc.

    This helps make it clearer what's actually going to be returned as 'It's the stuff in the normal brackets', whilst allowing the grouping effect you need here.