in reply to Req Expression translation

\s+ will match at least one whitespace character¹. The match will consume as many as it can.

.*? will match 0 or more characters (except for newlines, which it will only match if the /s modifier is given.) The ? makes the match non-greedy. It will match as little as is possible. It isn't particularly useful unless there is something after that portion of the regex.

1. Bold text added for exactness. See comments below. \s is equivalent to the character class: [ \r\n\f\t]. (And thanks to diotalevi for catching my laziness.)

-sauoq
"My two cents aren't worth a dime.";

Replies are listed 'Best First'.
Re: Re: Req Expression translation
by diotalevi (Canon) on Jun 13, 2003 at 21:25 UTC

    \s+ will at least one space
    That's "whitespace". "space" is a character which happens to be whitespace, there are four other characters that match it as well (tab, space, carriage return, line feed, vertical tab)

    [Added - sauoq caught me out on the VTAB vs FF character. I never use either so I forget that part. I knew chr(12) was whitespace (through running perl -le '$,="\n"; print grep chr() =~ /\s/, 0 .. 255') but I didn't remember what it was.]

      Yes, of course; "whitespace."

      Your list is wrong though. You are missing form-feed ("\f") and you have incorrectly included a vertical tab.

      perl -le 'print "\f" =~ /\s/ ? "match" : "no match"' # form-feed perl -le 'print chr(11) =~ /\s/ ? "match" : "no match"' # 11 is vertic +al tab.
      -sauoq
      "My two cents aren't worth a dime.";
      
Re: Re: Req Expression translation
by Anonymous Monk on Jun 14, 2003 at 02:34 UTC
    I am using it as:
    anotherwordhere\s+.*? wordhere
    I am trying to fetch a match on a word before "wordhere" and it does work but not sure what the "?" does.

    You said "?" acts as non gready here. Please explain more on this? I am still not sure what "?" does?

      Sometimes an example helps...

      #!/usr/bin/perl -w use strict; my $string = 'anotherwordhere foobar wordhere baz qux wordhere'; $string =~ /anotherwordhere\s+(.*?) wordhere/; print "Non-greedy match: '$1'\n"; $string =~ /anotherwordhere\s+(.*) wordhere/; print "Greedy match : '$1'\n";
      That prints:
      Non-greedy match: 'foobar' Greedy match : 'foobar wordhere baz qux'
      Do you see why it is called non-greedy? It matches as little as it has to in order to get the job done. The normal (greedy) behavior is to match as much as it can (and still get the job done.)

      The getting the job done part is what's most important. The distinction between greedy and non-greedy is only significant when there is both a short and a long way to get the job done. If you changed the string in that example so that it ended in "wordnothere" , both would match the same thing.

      I hope that clears it up a bit. Some experimentation will probably help your understanding.

      -sauoq
      "My two cents aren't worth a dime.";
      
        Thanks!