in reply to force regular expression to match every item only once
It may be a small modification to the code I provided for dice's coefficient is what you are after. Consider: (note the two lines with the trailing ##)
use strict; use warnings; use List::Compare; my @words = qw(dictate world mamal zezl); my %dict; # Build a lookup for the dictionary words while (defined (my $word = <DATA>)) { chomp $word; next unless length $word; my %dup; my @bigrams = grep {! $dup{$_}++} $word =~ /(..)/g; ## next unless @bigrams; $dict{$word} = \@bigrams; } # Process the given words for my $word (@words) { my %dup; my @bigrams = grep {! $dup{$_}++} $word =~ /(..)/g; ## next unless @bigrams; for my $dictWord (keys %dict) { my $lc = List::Compare->new($dict{$dictWord}, \@bigrams); my @common = $lc->get_intersection (); my $diceCoef = 2 * @common / (@bigrams + @{$dict{$dictWord}}); next unless $diceCoef; print "Dice coefficient for '$word' and '$dictWord' is $diceCo +ef\n"; } } __DATA__ words zezezl
Prints:
Dice coefficient for 'world' and 'words' is 0.5 Dice coefficient for 'zezl' and 'zezezl' is 1
|
|---|