kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,:

Greetings :-):

I'm trying to match a pattern that consists of a name, a pair of parentheses and a number within the parentheses:

The basic pattern is as follows:

name1(6)

However, I would like the match to work for a single occurrence of the pattern as well as for repeated occurences of that pattern delimited by a colon:

namexx(10)
name2(8):name3(3)
name4(12):name5(3):name6(3)


The code I've come up with below doesn't work.
error("Input error.") if ($input !~ /(.+?\(\d+\):?.+?\(\d+\))+/);
Can someone enlighten me?

kiat

Edit kudra, 2002-05-11 Replaced font with i

Replies are listed 'Best First'.
Re: One or more of a pattern...
by samtregar (Abbot) on May 11, 2002 at 04:48 UTC
    Your regex has several problems. First, your regex requires at least two instances of foo(num) to match - it can't match just one because the next expressions in the pattern are non-optional.

    Also, your regex uses .+? in ways that allow the pattern to defeat your requirements. Consider this string - "foo(10)bar(10)", which I assume shouldn't match. But it will match /.+?\(\d+\)/. The reason is that .+? can match "foo(10)bar" and then the rest of the match can succeed on just "(10)". I fixed this by changing your .+? to [^\(]+ which means "any character except an open paren". You might be able to narrow that down further.

    Finally, you didn't anchor your pattern which means that it is free to match anywhere in the string. I added \A to the front and \z to the end to force the regex to span the whole string.

    Here's a replacement pattern that works, documented using the /x modifer that allows whitespace and comments:

    / \A # start at the beginning of the string [^\(]+ # match one or more non-( characters \(\d+\) # match a (number) expression ( # start the optional part : # a single, required colon [^\(]+ # match one or more non-( characters \(\d+\) # match a (number) expression )* # match this subexpression zero or more times \z # end at the end of the string /x;

    -sam

    UPDATE: My first attempt was bad news. Just goes to show you can never think too carefully about a regex!

Re: One or more of a pattern...
by tachyon (Chancellor) on May 11, 2002 at 05:04 UTC
    while(<DATA>) { print /^([-\w' ]+\(\d+\)(:[-\w' ]+\(\d+\))*)$/ ? "match capture >$ +1< $_\n" : "no match $_\n"; } __DATA__ namexx(10) name2(8):name3(3) name4(12):name5(3):name6(3) O'Keefe(1) Jo-Anne Smith(2) F$%^Y@^(1) name(1)name(3) name(1)::name(3)

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Don't use ^ and $ when you mean \A and \z!

      -sam

        In this case, where the string the pattern is being run against was obtained from a while (<FILE>) (and should, therefore, be a single line), how do they differ and why do you prefer \A/\z?