perlynewby has asked for the wisdom of the Perl Monks concerning the following question:

I have read and tested my code but I don't quite understand the output in 2 cases, will you explain?

I get two different output here from a single regex expression and the difference is a space between what I assigned to be my $1 and my $2 captures

I have googled this and read thru perl tutorials and I have found no real explanation that can answer my simple question, perhaps you can help?

s/^(.*)+([^ ]+)$/$2, $1/; #$1 is anything that starts is 1st name so / +^(.*) and $2=anything else after

this worked, please note the space between $1 and when I try to capture the $2 value...I only found this worked because I fat fingered it while looking for the reason my first try didn't work..,I ended up doing it differently to capture right last name but I still have need to get answer on this

s/^(.*) +([^ ]+)$/$2, $1/;

entire test code that gives me an error for demo

use strict; my @cognome = last_name_first("leonardo Da Vincy","Raffaello da Urbino +"); print join(";",@cognome); sub last_name_first { my@names=@_; foreach(@names){ s/^(.*)+([^ ]+)$/$2, $1/; #$1 anything that starts is 1st name a +nd $2=anything else after..FAIL! #s/^(.*) +([^ ]+)$/$2, $1/; #works... space seems to do it,why? } return @names; }

Replies are listed 'Best First'.
Re: need help with explaining the output
by Athanasius (Archbishop) on Jul 16, 2015 at 07:24 UTC

    Hello perlynewby,

    First off, it’s good that you use strict; but you should always use warnings; as well.

    Now, in this regex:

    /^(.*)+([^ ]+)$/

    the characters * and + are both quantifiers, meaning they tell the regex engine how many of the preceding entity it should try to match. * means zero or more, and + means one or more. So the construct (.*)+ means: match zero or more non-newline characters zero or more times, and do this one or more times. Which doesn’t make a lot of sense. See “Quantifiers” in perlre#Regular-Expressions and consider carefully which quantifier you need.

    As stevieb has explained, these quantifiers are greedy, meaning the regex engine will try to match as many characters as possible and stop looking after the longest successful match. To make the quantifiers non-greedy, append a ?:

    use strict; use warnings; my @original_names = ( 'Fred Flintstone', 'Leonardo da Vinci ', 'Raffaello da Urbino', ); my @cognome = last_name_first(@original_names); print join(';', @cognome); sub last_name_first { my @names = @_; for (@names) { s/\s+$//; s/^(.*?)(\S+)$/$2, $1/; s/\s+$//; } return @names; }

    Output:

    17:14 >perl 1308_SoPW.pl Flintstone, Fred;Vinci, Leonardo da;Urbino, Raffaello da 17:14 >

    Note that the above script also removes trailing whitespace from the last and first names. In the former case, the whitespace might be there; in the latter case, it is certain to be. The character class \S matches any non-whitespace character.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Athanasius, these are great details. I am sure I will make more mistakes with these literal greedy regex matching. I thought it'd be a nice way to get my hands dirty here...I did and learned and more to practice.

      will try the advice given to a regex enginee provide to me here

      Thank you Sir!
Re: need help with explaining the output
by 1nickt (Canon) on Jul 16, 2015 at 00:04 UTC

    Read the Perl perlretut, especially the part on character classes. You probably want a common class like \w.

    Your first regexp matches everything up until the second one matches.

    It's a good use of your time to stop what you are doing, take 30 minutes, and read and understand the tutorial !

    Update: links to docs

    The way forward always starts with a minimal test.

      in the end, my solution was to use \w to get the regex to properly capture on the string for first name and last names...but

      I did read the tutorials on regex and still unable to understand it how the space there worked

      thanks

        Because without the space, you are capturing EVERYTHING (.*) INCLUDING spaces, up until the very last non-whitespace character in the string. Without the space, the regex doesn't know to stop at a space.

        I always use \s+ in place of literal spaces. I find it makes the regex far easier to understand, and way less likely I'll overlook a literal space (which is exceptionally easy to do).

        -stevieb

Re: need help with explaining the output
by stevieb (Canon) on Jul 16, 2015 at 00:23 UTC

    The commented out regex grabs everything up to a space greedily... that is, it matches everything up to the last literal space before end of string (so it grabs up to " Vinny"), and $1 is populated with anything not a literal space up to a literal space (or end of string). Put a space in "Vinny" to see for yourself.

    The first regex simply captures everything greedily all the way up to the very last non-whitespace character before end of string into $1, and puts the rest in $2. The 'rest' in this case is the last letter in the first name.

    -stevieb

      thanks I think I got it with your quick explanation.

Re: need help with explaining the output -- regex tools
by Discipulus (Canon) on Jul 16, 2015 at 07:43 UTC

      Grazie Mille for the pointer to the regex coach...I got to playing with it last night and I think it was a good source to test my expressions