in reply to Optimizing a string processing sub
This prints:sub score { my ($word1, $word2) = @_; my (%chars1, %chars2) = (); $chars1{$_}++ for split '', $word1; $chars2{$_}++ for split '', $word2; # the minimum of the two hashes is the number in common for each l +etter my $sum = 0; $sum += ($chars1{$_} < $chars2{$_} ? $chars1{$_} : $chars2{$_}) for keys %chars1; return $sum; } while (<DATA>) { chomp; print "$_: " . score(split /\s+/) . " in common\n"; } __DATA__ perl monk help temp frood hoopy bilbo baggins jibber jaber
perl monk: 0 in common help temp: 2 in common frood hoopy: 2 in common bilbo baggins: 2 in common jibber jabber: 5 in common
I notice your algorithms give 3 matches for 'bilbo' and 'baggins'. I think this is because both 'b's in bilbo match inside baggins. I'm not sure if this is correct behavior by your specifications or not.
Update: To speed up your score2 sub, consider using index($word2, $a) > 0 instead of the regex match. Changing this alone made it approximately as fast as your initial score sub for me.
blokhead
|
|---|