Correct case

stew has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Correct case by dbush (Deacon) on Jan 13, 2003 at 16:51 UTC
I've not personally used it, but the module Lingua::EN::NameCase may be useful. It seems to have a series of rules that deal with cases like the following: `Original Name Case -------- --------- KEITH Keith LEIGH-WILLIAMS Leigh-Williams MCCARTHY McCarthy O'CALLAGHAN O'Callaghan ST. JOHN St. John` [download] Regards, Dom.	[reply] [d/l]
Re: Re: Correct case by herveus (Prior) on Jan 13, 2003 at 21:25 UTC
Howdy! ...but feed it "Owen ap Tudor", "Sveinn inn Danska", and "David ben Jesse"... Of course, that's Lingua::EN::NameCase, not Lingua::welsh, scandihoovian, or jewish::NameCase... The original question has no general solution, since name capitalization conventions vary from culture to culture. yours, Michael	[reply]
Re: Re: Re: Correct case by dbush (Deacon) on Jan 14, 2003 at 14:02 UTC
Hi herveus, ++ on the point that there is no general solution to this problem but I thought I would just check out how Lingua::EN::NameCase would deal with your examples... #perl -w use strict; use warnings; use Lingua::EN::NameCase qw( NameCase nc ) ; my @proper_names = ( 'Owen ap Tudor', 'Sveinn inn Danska', 'David ben Jesse' ); my @lowercase_names = map { lc } @proper_names ; my @result = NameCase( @lowercase_names ) ; my ($iCount, $bMatch); for ($iCount = 0; $iCount <= $#result; $iCount++) { $bMatch = $proper_names[$iCount] eq $result[$iCount] ? 'Y' : 'N'; write; } exit; format STDOUT_TOP = Orignal case Output from NameCase Match ============================== ============================== ===== . format STDOUT = @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<< $proper_names[$iCount], $result[$iCount], $bMatch . __END__ [download] Surprisingly the output looks like this: `Orignal case Output from NameCase Match ============================== ============================== ===== Owen ap Tudor Owen ap Tudor Y Sveinn inn Danska Sveinn Inn Danska N David ben Jesse David ben Jesse Y` [download] "Sveinn Inn Danska" is not correct but surprisingly for a Lingua::EN module it does deal well with the other cases. Also from the documentation there is a variable that can be set to deal with special cases for Spanish... Perhaps the module itself has evolved to become a misnomer. Regards, Dom. Update: Corrected typo on link. Update: Corrected typo on incorrect name.	[reply] [d/l] [select]
Re: Correct case by gjb (Vicar) on Jan 13, 2003 at 16:18 UTC
This should do what you want: `my $upcased = join(" ", map(ucfirst($_), split(/\s+/, $str)));` [download] Note that I didn't test this and that it will replace multiple spaces between words with a single one. Basically, the `ucfirst` function is applied to each word in the list obtained by splitting the original string. After that, that list is join together again. Hope this helps, -gjb-	[reply] [d/l] [select]
Re: Re: Correct case by Zaxo (Archbishop) on Jan 13, 2003 at 16:34 UTC
++gjb. Here is a variation which better preserves the original: `my $upcased = join '', map { ucfirst } split /(\s+)/, $str;` The parens in split's pattern make the seperating string be included in the resulting list. The ucfirst function has no effect on the captured whitespace. After Compline, Zaxo	[reply] [d/l]
Re: Re: Correct case by dragonchild (Archbishop) on Jan 13, 2003 at 16:35 UTC
`$name =~ s/(?:\s\|^)(\w)/uc($1)/eg;` [download] Faster and easier to read. (Shorter without obfuscation means less for the human to parse.) ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply] [d/l]
(jeffa) Re: Correct case by jeffa (Bishop) on Jan 13, 2003 at 16:25 UTC
Be sure and check out the discussions at this oldie, Upper case first letter of each _ delimited word. Sure, an underscore is not a space, but you get the point. ;) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply]
Re: Correct case by fletcher_the_dog (Friar) on Jan 13, 2003 at 16:43 UTC
What about just doing `s/\b(\w)/uc($1)/sge;` ?	[reply] [d/l]
Re: Correct case by Abigail-II (Bishop) on Jan 13, 2003 at 19:28 UTC
Simpler: `s/\b(\w)/\u$1/g;` [download] Abigail	[reply] [d/l]
Re: Correct case by stew (Scribe) on Jan 13, 2003 at 17:27 UTC
Cheers for all the help went for the module Lingua::EN::NameCase	[reply]