in reply to dice's coefficient

What's the coefficient of "aardvark" and "dark"? "aardvark" and "arbitrary"? Even http://en.wikipedia.org/wiki/Dice%27s_coefficient doesn't clarify this.

Not sure what you want to do with the coefficients, so I made stuff up:

use strict; use warnings; $| = 1; print "Enter word: "; chomp(my $word = <STDIN>); my @pairs = $word =~ /(?=(..))/g; my $matcher = qr/(?=(@{[join "|", @pairs]}))/; my %coef; open my $dict, "<", "/usr/share/dict/words" or die "Couldn't open dictionary: $!"; while (my $dictword = <$dict>) { chomp($dictword); # skip proper nouns and anything with a non-letter next if $dictword =~ /[^a-z]/; my $matches = () = $dictword =~ /$matcher/g; my $coef = 2 * $matches / (@pairs + length($dictword)-1); push @{$coef{$coef}}, $dictword; } print "Top coefficients for $word:\n"; for my $coef ((sort { $b <=> $a } keys %coef)[0..4]) { next if ! $coef; print "$coef: ", join " ", @{$coef{$coef}}, "\n"; }

Replies are listed 'Best First'.
Re^2: dice's coefficient
by hiddenOx (Novice) on Apr 19, 2008 at 06:20 UTC
    ySTH,

    Thats really great what you just said. I was searching for such help for a while.

    but there is an issue, how to make it match the expression only once. for example assuming word is:
    "gogo" the ngram will be go-og-go
    word requested to check is: "golo" so the gram will be go-ol-lo

    The number of matches in the current code will be 2 as it counted go twice, although it should only be counted once as it should be matched only once.. so the score should be 1....

    Please Help.

    waiting your kind reply
    Thank you very much