Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I have searched the archive for this wisdom without success. I thought I had a simple problem but don't seem to find help anywhere.
I have a file of full names
example
John Hand Brown
Cindy Jones
Thomas More
and I want to get change this to a list by last name and cannot seem to find how to do it anywhere
example
Brown, John Hand
Jones, Cindy
More, Thomas
Can you help?
gde
Edited by Chady -- added code tags.
Re: Changing from full name to last, first mid
by Old_Gray_Bear (Bishop) on May 24, 2004 at 21:29 UTC
|
I have been down this long, painful road. First, look at Lingua::EN::NameParse, it can help. Second, consider the following names:
- J R Jones
- JR Jones (Yes, no vowel in the first name. I have an uncle who is WD Bascombe, III; it's that way on his birth-certificate, and no, his father was not WD, Jr....)
- J. R. R. Tolkein
- Inez dela Vega y Montoya
- Louis de la Salle
- Tiger (Single name, not first, not last, just 'name')
- J. R. Jones, III
sigh
----
I Go Back to Sleep, Now.
OGB
| [reply] |
|
luckly, my quest was not this complicated...The few Jr., and III could be easily handled and it was a one shot deal.
Thank you for the glimpse into the abyss...
| [reply] |
Re: Changing from full name to last, first mid
by fletcher_the_dog (Friar) on May 24, 2004 at 21:14 UTC
|
From the command line you could just do:
perl -p -i -e 's/^(.*?)\s*(\w+)\s*$/$2, $1\n/' list.txt
Update
This is making the assumption that the names are on different lines, if they are on the same line then you are going to have a very hard time determining where one name ends an another begins | [reply] [d/l] |
|
The s/ code did it...I guess I need to look more at regular expressions...thank you
gde
| [reply] |
|
With your help I created my first "useful" perl program...I'm so happy. I have found inter peace.
gde
| [reply] |
Re: Changing from full name to last, first mid
by hardburn (Abbot) on May 24, 2004 at 20:46 UTC
|
John Hand Brown Cindy Jones Thomas More
Knowing nothing else about your data, I look at this and decide that there are three seperate names here, which are "John Hand Brown", "Cindy Jones", and "Thomas More". But it's quite possible that the names are actually "John Hand", "Brown Cindy", and "Jones Thomas More", or perhaps some other combination. Consider that the human brain is much, much better at solving ambiguity than computers are (or at least making a solution that is closer to reality).
If the data above is repesentative of what you have, then I don't think you're going to find a solution with even an acceptable failure rate.
----
send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.
| [reply] [d/l] |
Re: Changing from full name to last, first mid
by CountZero (Bishop) on May 24, 2004 at 20:33 UTC
|
It would be helpful if you could show us the format of the file with the names. If it is just a long list of names, one after another, without any delimiters, I'm afraid there is no solution for your problem.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
Re: changing full name to last, first mid
by mifflin (Curate) on May 24, 2004 at 20:51 UTC
|
while (<DATA>) {
@name = split /\s/;
print pop(@name), ', ', join(' ', @name), "\n";
}
__DATA__
John Hand Brown
Cindy Jones
Thomas More
produces the output...
Brown, John Hand
Jones, Cindy
More, Thomas
| [reply] [d/l] |
|
| [reply] |
|
Other gotchas:
Prefixes for last name: De, Del, Dela, Di, Du, El, La, Le, Mac, Mc, San, St., Van, Vanden, Vander, Ver, Von, etc.
Suffixes for last name: II, III, IV, Sr, Jr, MD, PhD, etc.
| [reply] |
Re: Changing from full name to last, first mid
by davido (Cardinal) on May 25, 2004 at 05:25 UTC
|
You have a challenege ahead of you, honestly.
Consider the following names:
Mike Brown => Easy.
John Paul Williams => Easy too.
Biff Mc Fly => This is harder.
Peter David Van Den Berghe => Now what do you have in mind?
Catherine Zeta-Jones => Got a rule for this one?
The point is, what constitutes a last name? In one of the examples above, Mc Fly is the last name. In another example, Van Den Berghe is the last name. ...You would never say, "Hello Mr. Berghe" ... It's Mr. Van Den Berghe, always. Yet how are you going to come up with a hard fast set of rules that take into account all possible forms of last names?
It's hard. You would almost need a known last-name lookup table to match against.
| [reply] |
|
|