nop has asked for the wisdom of the Perl Monks concerning the following question:
Hi. I am looking for a snippet or module to proper case
names, and I didn't see anything in CPAN. It seems simple,
but there are special cases. Anyone have
code they can share?
Examples
Fred Smith-Barney III is right, Fred Smith-barney Iii isn't
Bobby McPhillips is right, Bobby Mcphillips isn't
Lisa Top, PhD is right, Lisa Top, Phd isn't
etc
Thanks
Re: Proper case for names
by t0mas (Priest) on Sep 01, 2000 at 14:30 UTC
|
| [reply] |
Re: Proper case for names
by KM (Priest) on Sep 01, 2000 at 18:30 UTC
|
Look at Lingua::EN::NameCase and Lingua::EN::NameParse.
Cheers,
KM | [reply] |
Buzzcutbuddha (Too much variation in names) - RE: Proper case for names
by buzzcutbuddha (Chaplain) on Sep 01, 2000 at 16:31 UTC
|
Merlyn said it in this node RE: Uppercase First Letter w/exceptions, and I'll repeat it, you have names like O'Reilly to test for, sometimes, Mcphillips is correct, depending on the preference of the user...For this reason I don't think that a module has been written yet. You can always make one. | [reply] |
(bbq) Re: Proper case for names
by BBQ (Curate) on Sep 01, 2000 at 18:13 UTC
|
If you wanted to get closer to your objective, you should shoot for "Proper case for names in the English language". IMHO, this is not a problem to be solved by perl or programming altogether. Wouldn't this be better tackled by handling the your data-entry methods?
#!/home/bbq/bin/perl
# Trust no1!
| [reply] |
|
Your point is well taken.
Yes, entering clean data is easier than cleaning it later. However, I'm dealing with a large established
database, with over ten million names. Errors do creep in over time...
| [reply] |
|
| [reply] |
Re: Proper case for names
by gnat (Beadle) on Sep 02, 2000 at 04:35 UTC
|
If you're sanitizing a database, and you know that the
vast majority of words are capitalized correctly, then
the problem is easy to solve. Go through the database,
and for each name generate the lowercase version. Keep
track of how many differently-cased forms correspond
to the one common lowercase form ("mckenzie" vs "McKenzie"
vs "Mckenzie"). The ones that rarely occur are the
mistakes, the ones that often occur are correct.
You hope. :-)
Nat | [reply] |
Re: Proper case for names
by fundflow (Chaplain) on Sep 01, 2000 at 18:02 UTC
|
s/(\w+)/\u$1/g
seems to work for most cases and if the input case is correct,
it won't change it (i.e. McArthur remains McArthur)
| [reply] [d/l] |
|