D:\> 1123881a.pl --- Ln 25 Printing keys and values for the HASHES, %F1 and %F2 ...and creating ARRAYS @F1combined and @F2combined. brown => 1 all => 1 their => 1 country => 1 is => 1 fox => 1 aid => 1 while => 1 to => 2 time => 1 for => 1 Now => 1 spooon => 1 dog's => 1 of => 1 away => 1 and => 1 with => 1 over => 1 run => 1 the => 5 quick => 1 fork => 1 red => 1 come => 1 men => 1 good => 1 lazy => 1 jump => 1 --- %F2, next: --- brown => 1 all => 1 runs => 1 jumps => 1 their => 1 country => 1 back => 1 is => 1 fox => 1 aid => 1 time => 1 while => 1 to => 2 for => 1 spoon => 1 dog's => 1 of => 1 away => 1 now => 1 with => 1 the => 6 and => 1 over => 1 fork => 1 quick => 1 red => 1 come => 1 men => 1 lazy => 1 good => 1 $item_count: 30 # some output, similar to the following, has been deleted for the sake of brevity # (even so, brevity is not present in abundance) ... >> Ln 56 $i: 22 |spooon => 1| didn't match entry, |spooon => 1| >> Ln 56 $i: 23 |the => 5| # No allowance for variance in occurences, 5 in file1 and 6 in DATA found |the| in both arrays (files) # Part of the loose spec; should this count for similarity or dis-similarity? >> Ln 56 $i: 24 |their => 1| found |their| in both arrays (files) >> Ln 56 $i: 25 |time => 1| found |time| in both arrays (files) >> Ln 56 $i: 26 |to => 2| found |to| in both arrays (files) ... Use of uninitialized value $entry in scalar chomp at D:\_Perl_\pl_test\1123881a.pl line 55, line 1. Use of uninitialized value $entry in concatenation (.) or string at D:\_Perl_\pl_test\1123881a.pl line 56, > Ln 56 $i: 29 || Use of uninitialized value $entry in pattern match (m//) at D:\_Perl_\pl_test\1123881a.pl line 57, line 1 $match_count: 27 $mismatch: 2 SLOPPY SPEC: (AMONG OTHER ISSUES) does not treat cases where the number of instances of a word in one file is different than the number of instances in the second file as a mismatch (eg. if the word is in both, though in differing quanties, it's treated as a match. No allowance made for use with arrays having different numbers of elements (variance produces 'uninitialised" warnings Here's one measure of SIMILARITY (using matchs/total elements evaled): 0.9 Another uses the total of matches and mismatches as the divisor: 0.931034482758621 Magnitude of DIS-similarty (using the ratio of mismatches/matches) : 0.0740740740740741 By the same sloppy spec, but using mismatch/elements_in_first_array): 0.0689655172413793