in reply to question about the star(*) quantifier

To elaborate on the previous answer, Perl's regex engine finds the left-most match, stopping as soon as it gets a match. In this case that is the zero-length string at the beginning of your text. Other regex engines are different - some prefer the longest possible match, which would have yielded your expected "1000".

A good book about regexes is Mastering Regular Expressions, which answers this question in much more detail and many others. I highly recommend it for all Perl programmers.

-sam

  • Comment on Re: question about the star(*) quantifier

Replies are listed 'Best First'.
Re^2: question about the star(*) quantifier
by sgt (Deacon) on Dec 21, 2006 at 20:03 UTC
    Hi Sam,
    Perl's regex engine finds the left-most match

    I would have phrased it like this: like most engines Perl's looks by default for the longest leftmost match.

    % stephan@labaule (/home/stephan) % % echo "okay stephan do test it!" | + > perl -lne 'm!(steph.?)!; print "seen: [$1]\n"' seen: [stepha]

    Actually POSIX mandates longest possible for alternations as they all start at the same place. The owl book calls that POSIX NFA.

    % stephan@labaule (/home/stephan) % % echo "okay stephan do test it!" | > perl -lne 'm!(steph|steph.?)!; print "seen: [$1]\n"' seen: [steph]

    so yes, perl's is not POSIX NFA (for various efficiency reasons)

    Actually not many engines follow POSIX on this, mostly (ugly -- because too ancient) system libraries (on Un*x), ksh's does too, even the Hackerlab library does (on second thought seems normal as it was supposed to be a drop-in replacement for the C lib on Un*x -- and POSIX). Tcl's is hybrid so I am not sure. And every engine after Perl5 birth has essentially copied Perl's.

    regards --stephan
    by the way ksh notation is interesting (zsh can masquerade as ksh for this if you don't have a ksh to try)
      I first wrote "longest left-most" and then revised it to just "left-most" when I remembered that Perl stops at the first matching alternation, longest or not. Maybe it's useful to think of it as "longest" anyway, but I prefer "greedy".

      -sam

Re^2: question about the star(*) quantifier
by snowsky (Initiate) on Dec 21, 2006 at 17:48 UTC

    Thanks for all your quick replys, but i still have a question.

    The pattern /(\d*)/'s first match should be a number, right? instead of the zero length string. In my case, it should find character '1' first and assign value '1' to $1.

    Please advice. :)

      Nope. \d* means "zero or more numbers" not "one or more numbers". It is perfectly valid for \d* to match the empty string, which is "zero numbers". If you want to be sure to match a number, use \d+, which means "one or more numbers".

      -sam

      It's already been said above, but I'll restate it:

      \d* means explicitly: "Match 0 or more digits"

      That matches any sequence of characters that is "0 or more digits".

      At the very beginning of your string (leftmost portion), a string matches that - an empty string '' right before "I fear that...".

      Update: I guess samtregar beat me to it, but our message is the same :)



      --chargrill
      s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)

        I see now.

        Thanks for all your answers and good explainations.

        Thanks again.

        Cindy

Re^2: question about the star(*) quantifier
by webfiend (Vicar) on Dec 22, 2006 at 22:24 UTC