Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

\s+.*?
Am I correct to say the above reg expression equals:

\s+
unlimited amount of spaces to include newlines?


.*?
unlimited amount of ANY characters but not sure what the "?" does?

Replies are listed 'Best First'.
Re: Req Expression translation
by sauoq (Abbot) on Jun 13, 2003 at 19:55 UTC

    \s+ will match at least one whitespace character¹. The match will consume as many as it can.

    .*? will match 0 or more characters (except for newlines, which it will only match if the /s modifier is given.) The ? makes the match non-greedy. It will match as little as is possible. It isn't particularly useful unless there is something after that portion of the regex.

    1. Bold text added for exactness. See comments below. \s is equivalent to the character class: [ \r\n\f\t]. (And thanks to diotalevi for catching my laziness.)

    -sauoq
    "My two cents aren't worth a dime.";
    

      \s+ will at least one space
      That's "whitespace". "space" is a character which happens to be whitespace, there are four other characters that match it as well (tab, space, carriage return, line feed, vertical tab)

      [Added - sauoq caught me out on the VTAB vs FF character. I never use either so I forget that part. I knew chr(12) was whitespace (through running perl -le '$,="\n"; print grep chr() =~ /\s/, 0 .. 255') but I didn't remember what it was.]

        Yes, of course; "whitespace."

        Your list is wrong though. You are missing form-feed ("\f") and you have incorrectly included a vertical tab.

        perl -le 'print "\f" =~ /\s/ ? "match" : "no match"' # form-feed perl -le 'print chr(11) =~ /\s/ ? "match" : "no match"' # 11 is vertic +al tab.
        -sauoq
        "My two cents aren't worth a dime.";
        
      I am using it as:
      anotherwordhere\s+.*? wordhere
      I am trying to fetch a match on a word before "wordhere" and it does work but not sure what the "?" does.

      You said "?" acts as non gready here. Please explain more on this? I am still not sure what "?" does?

        Sometimes an example helps...

        #!/usr/bin/perl -w use strict; my $string = 'anotherwordhere foobar wordhere baz qux wordhere'; $string =~ /anotherwordhere\s+(.*?) wordhere/; print "Non-greedy match: '$1'\n"; $string =~ /anotherwordhere\s+(.*) wordhere/; print "Greedy match : '$1'\n";
        That prints:
        Non-greedy match: 'foobar' Greedy match : 'foobar wordhere baz qux'
        Do you see why it is called non-greedy? It matches as little as it has to in order to get the job done. The normal (greedy) behavior is to match as much as it can (and still get the job done.)

        The getting the job done part is what's most important. The distinction between greedy and non-greedy is only significant when there is both a short and a long way to get the job done. If you changed the string in that example so that it ended in "wordnothere" , both would match the same thing.

        I hope that clears it up a bit. Some experimentation will probably help your understanding.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: Req Expression translation
by The Mad Hatter (Priest) on Jun 13, 2003 at 21:09 UTC
    Update Man do I feel stupid. \s is the equivalent of [ \t\n\r\f], which, of course, includes newlines.

    \s+ will match, as sauoq said, at least one space, but not newlines.