in reply to word association problem
For the words in wordlist.txt that contain one or more of the words in master.txt as a substring -- e.g. "accumbering", which contains both "accumb" and "accumber" -- you want to associate the wordlist word with the master word that constitutes the longest match -- i.e. "accumbering" should be listed with "accumber", not with "accumb". Have I got that right?
To do that, the approach is a little more detailed:
use strict; # mustn't forget that open(LIST, "wordlist.txt"); open(MSTR, "master.txt"); # get the wordlist my @wordlist = map { chomp; $_ } <LIST>; # get the master list, sorted by word length, longest words first my @master = sort { length($b) <=> length($a) } map { chomp; $_ } <MST +R>; # declare a hash to hold the findings: my %report; foreach my $lookfor ( @master ) { foreach my $lookat ( @wordlist ) { if ( $lookat =~ /$lookfor/ ) { $report{$lookfor} .= ",$lookat"; $lookat = ""; # erases this word from @wordlist } } } foreach my $word ( sort keys %report ) { $report{$word} =~ s/,/ /; # change initial comma to space print "$word:$report{$word}$/"; }
By seeking out the longest master words first, and "erasing" the hits from the wordlist array as you find them, each wordlist element will only be listed once, with the longest matching master word.
update:Chmrr's correction to my initial response came in while I was working on this one. He's right: his version will be more efficient (and he helped me fix a typo).
|
|---|