Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
This code does the capitalization on each word just fine, but my question is if I get a person's name like:
mcdonald or mcmaster or MCENROE what would be the best why to use this code but making sure that if names like that comes up it will also do the capitalization properly.
Here is the code
#!/perl/bin/perl -w use strict; ## capitalize each word's first character, downcase the rest my $test = cap("o'tools"); my $test2 = cap("mcdonalds"); my $test3 = cap("mcmaster"); my $test4 = cap("MCENROE"); print "**1-$test**\n"; # should print "O'Tools" print "**2-$test2**\n"; #should print "Mc Donalds" print "**3-$test3**\n"; #should print "Mc Master" print "**4-$test4**\n"; #should print "Mc Enroe" sub cap { my $case = shift || ''; $case =~ s/(\w+)/\u\L$1/g; return $case; }

Thanks for the Help!

Replies are listed 'Best First'.
Re: Capitalization Case Help!
by Corion (Patriarch) on Sep 15, 2010 at 19:33 UTC

    I'm pretty sure that you can't do that as there will be many different forms how people write their family names even if they all look the same uppercased. But if you want to really ire your customers with misspelled names, have a look at String::ProperCase::Surname

Re: Capitalization Case Help!
by Your Mother (Archbishop) on Sep 15, 2010 at 20:05 UTC

    Corion is right. This problem cannot be solved programmatically (not perfectly anyway). Names do not conform to any spelling rules and the affixes, capitals, and possible spacing also morph. Even in your "should print" examples you have chosen decidedly non-standard, but entirely possible, spellings. McEnroe is more likely to be right than Mc Enroe but you will also find Mcenroe, Macenroe, MacEnroe, and Mac Enroe exist in the wild.

Re: Capitalization Case Help!
by kennethk (Abbot) on Sep 15, 2010 at 20:31 UTC
    You may wish to consider the following off-site post before doing any significant processing on people's names. It doesn't provide much actionable information, but it will point out a lot of assumptions you've made. http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/.

    Using a name as a primary identifier can be extraordinarily and unnecessarily complicated for a given task.

Re: Capitalization Case Help!
by graff (Chancellor) on Sep 16, 2010 at 01:37 UTC
    Now that you've gotten the necessary lectures about why you shouldn't be doing this, here is a way to do specifically what you asked for (even though I think the insertion of the space is a mistake more often then not, so I would be inclined to leave out the added space):
    sub cap { local $_ = shift || ''; s/\b(\w\S*)/\u\L$1/g; s/(?<=\bO')(\w)/\u$1/g; s/(?<=\bMc)(\w)/ \u$1/g; return $_; }
    Since that uses zero-width look-behind, which has to involve a fixed number of characters to match, you'd have to add another separate regex to handle the case of "Mac..." (that is, just adding "a?" to the last regex above would cause a run-time error).
      ... zero-width look-behind ... has to involve a fixed number of characters to match ...

      5.10 adds the  \K variable-width look-behind Special Escape (see  "(?<=pattern)" "\K" in the Look Around Assertions section of perlre):

      >perl -wMstrict -le "print qq{ver $]}; for (@ARGV) { my $name = $_; $name =~ s{ \b Ma?c \K (\w) }{ \u$1}xmsg; print qq{'$_' -> '$name'}; } " Mcdonald Macdonald ver 5.010001 'Mcdonald' -> 'Mc Donald' 'Macdonald' -> 'Mac Donald'