in reply to Gender prediction

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^2: Gender Prediction
by moritz (Cardinal) on Jun 06, 2007 at 09:50 UTC
    perlmonks is not a "we rewrite your code" service.

    If there is anything in particular that you don't like about the code, tell us so.

    And tell us what you tried, and why you are not satisfied with it.

Re^2: Gender Prediction
by robot_tourist (Hermit) on Jun 06, 2007 at 10:55 UTC

    My untrained eye can't spot any particular code smells. Though I have learned to use fetchrow_hashref, because it means you don't have to care about the order of the returned data, you just use the field names, but it does force you to learn about references. The code you have is already pretty concise and would only need a little more formatting to be more readable.

    Also note that if you don't need the code you commented out, get rid of it or it will accumulate. I know it's something I have to fix in my own code. The odd #print ... is OK, but if you have version control you don't need big blocks of code commented out 'just in case you need to uncomment it one day'

    How can you feel when you're made of steel? I am made of steel. I am the Robot Tourist.
    Robot Tourist, by Ten Benson

Re^2: Gender Prediction
by Limbic~Region (Chancellor) on Jun 06, 2007 at 12:23 UTC
    samsonp81,
    I understand what it is like to want to write code that is above my head. While the intent of Re: Refactoring a large script is not applicable to your situation, a lot of the advice inside is. The following is unfinished code that should give you some help as well.
    #!/usr/bin/perl use strict; use warnings; use DBI; # Establish connections to your respective databases my $email_dbh; my $email_sth; my $names_dbh; my $names_sth; open(my $fh, '>', 'bad.email') or die "Unable to open 'bad.email' for +writing: $!"; while (my ($email) = $email_sth->fetchrow_array()) { my @names = guess_names($email); if (! @names) { print $fh $email, "NONE FOUND\n"; next; } my $found; for my $name (@names) { my ($gender) = $names_sth->execute($name); if (defined $gender) { print join "\t", $email, $name, $gender; print "\n"; $found = 1; last; } } if (! $found) { print $fh $email, (join ", ", @names), "\n"; } } # After each run, check 'bad.email' to see if you can refine your patt +erns sub guess_names { my ($email) = @_; my @guess; # Strip domain from email address $email =~ s///; # See if it might be first.last, first_last, first-last if ($email =~ //) { push @guess, $1, $2; } # See if it might be first name followed by last initial push @guess, substr($email, 0, length($email) - 1); # Add more patterns here return @guess; }

    Cheers - L~R

Re^2: Gender Prediction
by lima1 (Curate) on Jun 06, 2007 at 09:58 UTC
    i need to enhance it further.. am looking for a better code.

    Yeah, like everybody else here. But what are exactly your problems? I suggest compiling a training/test set and then thinking why your algorithm does not classify some addresses correctly. If you then have a real question, ask the question here and don't forget to post the relevant data.

Re^2: Gender Prediction
by samsonp81 (Initiate) on Jun 08, 2007 at 11:07 UTC
    thanx for ur help ..! am done with my code. Text::GenderFromName module was helpful. here's the code which i have written.
    #!/usr/bin/perl use DBI; my $dbh = DBI->connect("DBI:mysql:database=email_categorisation;host=l +ocalhost;", "root", "", {'RaiseError' => 1}); use Text::GenderFromName; $query = "select email from emails_1"; $sth = $dbh->prepare($query); $email_record = $sth->execute(); $count =0; while(($email) = $sth->fetchrow_array()) { #print "--$email--"; $email =~/(.*\.*.*)\@(.+)/; my $user = $1; #print $user." -- "; $user=~s/[0-9|\_|\.]//g; #print $user." -- "; #matching $l = length($user); #print $l."\n\n"; $flag=1; for ($i=0;$i<$l;$i++){ if($flag==1){ my $removed_last_letter = substr( $user, $i); #print $removed_last_letter."\n"; $l2 = length($removed_last_letter); for ($j=0;$j<$l2;$j++){ my $removed_first_letter = substr ($removed_l +ast_letter, 0,$l2-$j); # returns '' (no warning) # print $removed_first_letter."\n"; if(length($removed_first_letter)>3){ my $gender = &gender($removed_first_le +tter) || ''; if ($gender eq 'f') { print "Email + : $email Matched with : $removed_first_letter + Gender: Female\n"; $flag=0;$count++; } elsif ($gender eq 'm') { print "Email + : $email Matched with : $removed_first_letter + Gender: Male\n"; $flag=0; $count++;} #else { print "$user +: UNSURE\n"; break; } } } } } } print "$count"; regards, samson.