I recently received some puzzled looks over my answer to this question. My regex was:
($wanted) = $line =~ /(\S*)/;
Two people questioned it explicitly; one of them said I was missing the ^ anchor.

This is perhaps the trickiest lesson to learn about regexes, partly because it's so simple and in its simplicity it is incorrectly interpreted. The * modifier matches 0 or more. It will always succeed, and it will always try to take as much as it can and still have the rest of the regex succeed. Furthermore, it will match as early in the string as possible.

In "this is cool", the regex matches "this". In "  is this cool?" it matches nothing, but successfully. Why doesn't it match "is"? Because the \S* matches at the very beginning of the string, where there are (quite plainly) zero non-whitespace characters. The "classic" example of this trickery is:

$_ = "fred xxxxxxx barney"; s/x*//; print; # fred xxxxxxx barney

_____________________________________________________
Jeff japhy Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Shouldn't \S* be ^\S*
by John M. Dlugosz (Monsignor) on Aug 03, 2001 at 02:21 UTC
    Ah, I see: it matches zero occurances, succeeds, and stops looking. Changing it to s/x+//; shows that it does indeed march forward if it doesn't find a match; it just happens that * is always happy.