Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm having coders block and can't figure this out. I have a list of names where sometimes the last initial is there with our without a period. How do I create a reg ex to split out the first name. The initial is always 1 letter with our without a period. All I want are the first names.
#!/usr/bin/perl -w use strict; use Data::Dumper; my @names = [ "Mark K.", "Bob H", "Kurt", "Mary Kay K", "Mary Jo Z.", "Mary Jo" ]; foreach my $name ( @names ) { my $f_name = $name =~ m/(.*)[a-zA-Z]/; print "$f_name \n"; } # @names with only first names would look like: # my @names = [ "Mark", "Bob", "Kurt", "Mary Kay", "Mary Jo", "Mary Jo" ];

Replies are listed 'Best First'.
Re: Parsing out first names
by prasadbabu (Prior) on Oct 13, 2006 at 05:23 UTC
Re: Parsing out first names
by ysth (Canon) on Oct 13, 2006 at 04:44 UTC
    The part matching the last initial, followed by an optional ., needs to be anchored to the end, and you probably don't want the space before it, either.
    /(.*) [a-zA-Z]\.?\z/
    You also need to have the match in list context to have it return the captured substring:  my ($f_name) = $name =~ m/...

    And I think you mean to have (), not [], in the assignment to @names.

Re: Parsing out first names
by grep (Monsignor) on Oct 13, 2006 at 04:44 UTC
    I wouldn't use a regex (well sorta). I'd use a split on whitespace. This also makes it easy to get last names by popping off the last value.

    Also watch your brackets. You want () for an array.

    my @names = ( "Mark K.", "Bob H", "Kurt", "Mary Kay K", "Mary Jo Z.", "Mary Jo" ); foreach ( @names ) { my ($f_name) = (split)[0]; my ($l_name) = (split)[-1]; print "$f_name \n"; }


    grep
    One dead unjugged rabbit fish later
      This is a good thought, but it only prints out:
      Mark Bob Kurt Mary Mary Mary
      which is only the first part of the first names--some of them are not complete.
        So do what grep already hinted at, split and keep only the parts you want:
        foreach ( @names ) { my @name_parts = split; pop @name_parts if $name_parts[-1] =~ /^[a-zA-Z]\.?$/; # throw last +name away my $f_name = join ' ', @name_parts; print "$f_name\n"; }

        Updated code to reflect Not_a_Number's correction, thanks!

        -- Hofmator

        Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Parsing out first names
by McDarren (Abbot) on Oct 13, 2006 at 04:42 UTC
    "..The initial is always 1 letter with our without a period..."

    Will it always be uppercase? (lets assume yes)

    So how about this (untested)...

    # Non-greedy capture of everything from the start of the string, until + we see # A whitespace character # followed by a single upper-case letter # followed by an (optional) period # followed by the end of the string /^(.*?)\s[A-Z]\.?$/;

    Cheers,
    Darren :)

Re: Parsing out first names
by awohld (Hermit) on Oct 13, 2006 at 04:54 UTC
    You could subsitute out the last Single chatacter with or without a period "."

    You also had an anonymous array ref in your @names array. Fixed in the below code.
    #!/usr/bin/perl -w use strict; use Data::Dumper; my @names = ( "Mark K.", "Bob H", "Kurt", "Mary Kay K", "Mary Jo Z.", "Mary Jo" ); foreach my $name ( @names ) { $name =~ s/ [a-zA-Z].?$//; print "$name \n"; }
    Prints:
    Mark Bob Kurt Mary Kay Mary Jo Mary
      Sorry to nitpick, but it needs to be: $name =~ s/ [a-zA-Z]\.?$// Yours chopped the "Jo" off of the second Mary Jo :(
Re: Parsing out first names
by johngg (Canon) on Oct 13, 2006 at 14:15 UTC
    I'm a little late to this party but I would take either of two approaches using regular expressions. The middle initial is optional and occurs at the end of the string so you can match that with (?:\s[A-Z]\.?)?\z and either discard it via substitution or capture a non-greedy match of everything up to it. Like this

    use strict; use warnings; my @names = ( q{Mark K.}, q{Bob H}, q{Kurt}, q{Mary Kay K}, q{Mary Jo Z.}, q{Mary Jo}); print qq{Original Substituted Matched\n}, qq{-------- ----------- -------\n}; foreach my $name (@names) { (my $firstNameBySubs = $name) =~ s{(?:\s[A-Z]\.?)?\z}{}; my ($firstNameByMatch) = $name =~ m{^(.*?)(?:\s[A-Z]\.?)?\z}; printf qq{%-12s%-12s%-s\n} , $name , $firstNameBySubs , $firstNameByMatch; }

    This produces

    Original Substituted Matched -------- ----------- ------- Mark K. Mark Mark Bob H Bob Bob Kurt Kurt Kurt Mary Kay K Mary Kay Mary Kay Mary Jo Z. Mary Jo Mary Jo Mary Jo Mary Jo Mary Jo

    Cheers,

    JohnGG

Re: Parsing out first names
by swampyankee (Parson) on Oct 13, 2006 at 17:49 UTC

    Like grep. I'd use split, splitting on whitespace.

    This will give you a list (which may have one element). Since your list doesn't contain surnames, I'd filter the list grep to eliminate initials, except when their elimination leaves nothing (I know people who use initials in lieu of their given names).

    I've got a sample here:

    #!perl use strict; use warnings; my @names = ( 'J Q', 'G Gordon', 'Mary Jane', 'Tommy K', 'Madonna', 'George W.', 'Jacques-Yves'); foreach my $name (@names){ my @nl = split(/\s+/, $name); my @list = grep { ! /^[A-Z][^A-Z]*$/i} @nl; pop(@nl) if (@list and $#nl > 0 and $nl[-1] =~ /^[A-Za-z][^A-Za-z] +*$/); $name = join(' ', @nl); } print join("\n", @names) . "\n";
    which produces this output:
    J Q G Gordon Mary Jane Tommy Madonna George Jacques-Yves

    I know it could be both better written and much shorter, but it's intended to be a demonstration, not production code.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.