bobdole has asked for the wisdom of the Perl Monks concerning the following question:

I searched Google and on here and came across Text::GenderFromName but most of our data we use for reporting are Spanish names so Text::GenderFromName doesn't appear to the best fit as it deals with predominantly American names.

This isn't something we HAVE to have but the more demographic information the better, just wondering if there was anything else out there to look at that may help.

Replies are listed 'Best First'.
Re: Determining gender based on first name
by hossman (Prior) on Jan 02, 2008 at 22:52 UTC

    I'd never heard of Text::GenderFromName until reading your post, but skimming the docs has shown me that it has 2 very clearly documented "American" biases...

    1. The raw data is based on US SSA sampling
    2. It uses Text::DoubleMetaphone as a fall back in some cases

    You can solve the first bias by using your own list of of names -- i'm sure someone somewhere online has a (free) list of common Spanish names ... you can just assume any name in only one list has a weight of "1" and if a name is in both lists, eyeball it and guess a weight based on your personal opinions.

    The second bias may not actually be that bad (I don't know how well the Double Metaphone algorithm does with Spanish names) but it can easily be turned off (the perldoc's even have an example of doing this) giving you just the simple weighted comparison.

    (of course, if you are providing your own name data, and not using metaphones, you are basically just using it to do two hash lookups and pick the one with a higher value ... which is about 2 lines of code)

      Sounds like a plan, thanks.

      A quick search of Google provided plenty of names.
Re: Determining gender based on first name
by Your Mother (Archbishop) on Jan 03, 2008 at 05:06 UTC

    Not trying to be a problem but "American names?" Like "Freedom" or "Shaniqua?" "Anglo" ne "American" ne "German" etc etc etc. As someone whose mail comes chronically with the wrong salutation I must chime in on the general issue: "Boooooo!"

    That said, Spanish is actually a language with gender in it. You should be able to get a pretty good starting point from little more than (a|o)\z and then grow it from there.

      You should be able to get a pretty good starting point from little more than (a|o)\z and then grow it from there.
      Unfortunately, there's plenty of Spanish feminine first names ending with an "o": Amparo, Consuelo, Olvido...
Re: Determining gender based on first name
by Old_Gray_Bear (Bishop) on Jan 03, 2008 at 19:09 UTC
    Not to rain on your parade, but.

    Consider 'Maria' -- is that a masculine name or a feminine name? It all depends on the context. I would bet that 'Maria Horatio Lopez' and 'Maria Consuela Lopez' would be distinguishable algorithmically, but how about just plain 'Maria Lopez'?

    The classic paradigm is 'Anne' -- as in Anne de Montmorency, Marshall of France in the 16th century. *He* was one of the French notables who fought at Agincourt, and survived. Now, granted his is the only masculine 'Anne' that I can find, but it only takes one exception to make your life unduly hard.

    A modern paradigm is 'Tracy' -- I know three people of that name and spelling; one male, two female.

    And People tend to be a very touchy about their names:
    A man's name is not like a mantle, which merely hangs about him, and which one perchance may safely twitch and pull, but a perfectly fitting garment, which like the skin has grown over and over him, at which one cannot rake and scrape without injuring the man himself. -- Goethe
    Mis-identifying the sex on the basis of the name alone can be a serious disaster.

    One of my female Tracy's is a serious contralto. She answers the phone with 'Tracy here'. Giving the person on the other end of the line precious little in the way of cues as to gender. She says that over half the time she gets mis-identified and the salesman uses "Mr. xxxx" for the next sentence, and then gets flustered when he figures out that he has made a mistake. She used to get seriously bent about this; but she grew out of that. Now she finds it hilarious.

    Have fun on this project, report back often and let us know what you've found. And build a CPAN module from your findings.

    ----
    I Go Back to Sleep, Now.

    OGB

      Or Marion Robert Morrison better known as John Wayne
      Getting it wrong may cause trouble!
      Consider 'Maria' -- is that a masculine name or a feminine name? It all depends on the context.
      I'd say that Maria is always feminine. If that's not the case, please provide further details or an example. However, what about Alex? You can only guess... ;)