b_e_n_82 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a string like "-UK 123 123-UK 123-UK 123-UK" where I want to match all '123-UK' repetitions (in the real problem the repetitions could be any number). I was wondering if someone could enlighten me why these two expressions do not give the same results when evaluating $1:

/((?:\d+-UK\W?)+)/

/((?:\d+-UK\W?)*)/

The former works (giving $1 as "123-UK 123-UK 123-UK") but the latter does not ($1 is null). In the later expression the '*' makes the group optional, but since the group exists I would expect it to match.

Many thanks

Replies are listed 'Best First'.
Re: Regular expression * vs +
by moritz (Cardinal) on May 20, 2008 at 09:06 UTC
    The * makes the expression match the empty string. The regex engine tries at pos 0 (aka start of the string), which is the lone dash. It tries to match the regex, succeeds with zero occurences of (?:\d+-UK\W?), stores the matched (empty) string in $1 and reports success.
Re: Regular expression * vs +
by mwah (Hermit) on May 20, 2008 at 09:54 UTC

    In addition to moritz' hint, you could look into the matching process by issuing (e.g.):

    ... my $txt = '-UK 123 123-UK 123-UK 123-UK'; print "|$1|\n" while $txt =~ /((?:\d+-UK\W?)*)/g; ...

    Regards

    mwa

      Close, but not sure OP wants this (from mwah's example):

      687551.pl syntax OK perl 687551.pl || || || || || || || || |123-UK 123-UK 123-UK| ||

      Whereas:

      my $txt = '-UK 123 123-UK 123-UK 123-UK abc-oops!'; print "|$1|\n" while $txt =~ /((?:\d+-UK\s))/g;

      produces

      perl 687551.pl |123-UK | |123-UK | |123-UK |

        I think the OP struggled over the difference between

        ... my $txt='-UK 123 123-UK 123-UK 123-UK'; print "|$1| - match until pos:" . pos($txt) . "\n" while $txt=~/((?:\ +d+-UK\W?)+)/g ; ...

        and

        ... my $txt='-UK 123 123-UK 123-UK 123-UK'; print "|$1| - match until pos:" . pos($txt) . "\n" while $txt=~/((?:\ +d+-UK\W?)*)/g ; ...

        which is the difference between (expr)+ and (expr)*, the latter matching everywhere and (of course) at 'expr', the former matching only at 'expr'.

        (maybe I misinterpreted his intention)

        Regards

        mwa

Re: Regular expression * vs +
by carol (Beadle) on May 20, 2008 at 15:16 UTC
    since the group exists I would expect it to match
    Yes: the group exists; in that you are right. But don't forget the empty string at the start also exists. Both have the potential to make a match happen. As the engine works left-to-right, the empty string does the job this time.
Re: Regular expression * vs +
by b_e_n_82 (Initiate) on May 20, 2008 at 16:05 UTC
    Great response from everyone - that completely clears my misunderstanding up - Thankyou
Re: Regular expression * vs +
by carol (Beadle) on May 20, 2008 at 14:58 UTC
    Any string matches /($whatever)*/.