| [reply] |
Look at Lingua::EN::NameCase and Lingua::EN::NameParse.
Cheers,
KM | [reply] |
Merlyn said it in this node RE: Uppercase First Letter w/exceptions, and I'll repeat it, you have names like O'Reilly to test for, sometimes, Mcphillips is correct, depending on the preference of the user...For this reason I don't think that a module has been written yet. You can always make one. | [reply] |
If you wanted to get closer to your objective, you should shoot for "Proper case for names in the English language". IMHO, this is not a problem to be solved by perl or programming altogether. Wouldn't this be better tackled by handling the your data-entry methods?
#!/home/bbq/bin/perl
# Trust no1!
| [reply] |
Your point is well taken.
Yes, entering clean data is easier than cleaning it later. However, I'm dealing with a large established
database, with over ten million names. Errors do creep in over time...
| [reply] |
| [reply] |
If you're sanitizing a database, and you know that the
vast majority of words are capitalized correctly, then
the problem is easy to solve. Go through the database,
and for each name generate the lowercase version. Keep
track of how many differently-cased forms correspond
to the one common lowercase form ("mckenzie" vs "McKenzie"
vs "Mckenzie"). The ones that rarely occur are the
mistakes, the ones that often occur are correct.
You hope. :-)
Nat | [reply] |
s/(\w+)/\u$1/g
seems to work for most cases and if the input case is correct,
it won't change it (i.e. McArthur remains McArthur)
| [reply] [d/l] |