kieps has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to take a person's full name, "Firstname Middlename Lastname" and invert it to "Lastname, Firstname Middlename". I'm reading in a file and trying to write out to a sql script.

Here is the data:

Literature|30001f3d|John M. Doe

This is what I'm trying to get:

INSERT INTO Books Title,FileID,author,authInv) VALUES ('Literature','30001f3d','John M. Doe','Doe, John M.');

I'm new to perl so I don't know how to write a script to invert it. I've seen a code before where you take the full name, start the counter at the end, go until you hit a space and reformat the field to add the comma but I just can't make it work. Any help would be appreciated. Thanks.

Here is the code:

print "What is the directory where we'll be working: "; $directory = <STDIN>; chomp ($directory); print "What is the file name, with a file extension: "; $file = <STDIN>; chomp ($file); $originalfile = "$directory/$file"; open(FIRST, "<$originalfile" ) || die "I cannot open $originalfile. $! +"; @contents = <FIRST>; close(FIRST); $new = "$directory/SQL-Insert-Titles.sql"; unlink($new); open(SECOND, ">$new") || die "I cannot open $new. $!"; foreach $line (@contents) { @splitline = split /\|/, $line; $title = $splitline[0]; $file = $splitline[1]; $author = $splitline[2]; $authinv = $splitline[3]; chomp($title); chomp($file); print SECOND "INSERT INTO Books Title,FileID,author,authInv) VALUES ('$title','$file','$author','$authinv')\;\n"; } close(SECOND); print "\n\tDone."; |
Edited 2005-04-08 by Ovid

Replies are listed 'Best First'.
Re: Inverting full names
by ikegami (Patriarch) on Apr 08, 2005 at 20:30 UTC

    This has come up before. Have you tried doing a Super Search? One of the things that was mentioned is that you need to review the results manually, because

    • some people have no middle name,
    • some people have multiple middle names,
    • some people have surnames with multiple words. e.g. Riki Le Cotey, and
    • (in your case) some books have multiple authors.
Re: Inverting full names
by Grygonos (Chaplain) on Apr 08, 2005 at 20:53 UTC

    I just performed a similar excercise for my employer. we were receiving names in this format last <suffix> first <middle> <> denotes optional

    In my experience it required more than one regex, and a good knowledge of your data. for example your john m. doe example would be done as follows

    use strict; use warnings; my @names = ('John-Boy M. Doe', 'John Doe', 'John St. Doe', 'John St Doe', 'John M. St. Doe', 'John M. O Doe', 'John M. O\'Doe'); foreach my $name (@names) { NAME_TEST: { $name =~ m{^([\w\-\']+)\s*(\w*\.*)\s((?:O|St)(?:\'|\.)*\s*[\w\ +-\']+)} && do{print 'First/Middle/(Prefix)Last'; + print $1.'-'.$2.'-'.$3."\n\n"; + last NAME_TEST;}; $name =~ m{^([\w\-\']+)\s*(\w*\.*)\s((?:O|St)(?:\'|\.)*\s*[\w\ +-\']+)} && do{print 'First/Middle/(Prefix)Last'; + print $1.'-'.$2.'-'.$3."\n\n"; + last NAME_TEST;} } }
    As you can see I included some other examples of things you'll have to deal with. Running these regexen in a certain order is important. If you take my example and swap them it gives incorrect results, because the 2nd one finds what it thinks is a first middle last (because it doesn't know about the predef'd prefixes and such) Hope some of that helped

Re: Inverting full names
by doom (Deacon) on Apr 08, 2005 at 23:59 UTC
    Have you taken a look at Lingua::EN::NameParse? It looks like a good try at solving this problem: Lingua::EN::NameParse
Re: Inverting full names
by Cody Pendant (Prior) on Apr 09, 2005 at 08:53 UTC
    VALUES ('Literature','30001f3d','John M. Doe','Doe, John M.');

    You're trying to do what? You're trying to do put "John M. Doe" and "Doe, John M." into your database?

    Not to be rude, but that's crazy.

    You need to separate out the Doe, the John and the M and put them into one field each. How you do that is another problem, but that's not a sensible database design.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
    =~y~b-v~a-z~s; print

      Unless you know how the data is being used, it's impossible to know what is, or isn't a good database design. There are many times when denormalizing the data makes good sense, and when you wouldn't store the component parts.

      If they don't need to distinguish between the first and middle names, it's easier to just not bother with that step. If those two fields are used for sorting the records, and the database doesn't handle derived indexes, then it make sense to store the full version, possibly along with the name broken down into pieces. If they never need the name broken down into its component parts, (eg, they're only using it for display and/or sorting purposes), then they may not need to maintain them.

      Yes, they would have more flexibility in the future by storing it in terms of first/middle/last, but if you're tuning for reads, it's not efficient to put everything back together for every access.

      I personally thought that saying something wasn't sensible, and that something specifically needed to be done, without an intimate understanding of the reasons for its existance to be very rude.