in reply to Challenge: 8 Letters, Most Words

Another brute force method that puts all words in 2of12inf.txt in 8 letter classes and counts their members. Runs 40mins on my machine.

use strict; use warnings; use Data::Dumper; sub fits { my( $class, $subclass ) = @_; return 0 if length( $class ) < length( $subclass ); $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; my $cnt = 0; for my $w (@words) { my $ws = join '', sort split //, $w; my $fitted = 0; for my $class (keys %classes) { if( fits( $class, $ws ) ) { $classes{$class}{$w}++; $fitted++; } } $classes{$ws}{$w}++ unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; for( @sorted ) { print "$_: ", scalar keys %{$classes{$_}}, "\n"; }

and here are my top ten:

aeinprst: 346 aeilprst: 344 adeiprst: 332 aeimprst: 328 adeilrst: 319 adeoprst: 316 aeilnpst: 314 adeinrst: 313 aeloprst: 313 aeilnrst: 312

Replies are listed 'Best First'.
Re^2: Challenge: 8 Letters, Most Words
by hdb (Monsignor) on Oct 05, 2013 at 14:11 UTC

    Cleaned up the code a bit:

    use strict; use warnings; sub fits { my( $class, $subclass ) = @_; $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; for my $w (@words) { my $fitted = 0; fits( $_, $w ) and $fitted = $classes{$_}{$w} = 1 for keys %classes; $classes{join '', sort split //, $w}{$w} = 1 unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; print "$_: ", scalar keys %{$classes{$_}}, "\n" for @sorted;

    So this is how it works:

    1. Reads the dictionary and sorts the words descending according to length, ie the 8-letter words first, then the 7-letter words etc. It also sorts alphabetically but that is not required.
    2. It then goes through the list of words and builds classes or sets of words indexed by their common letters. More specifically, it checks all classes created so far, whether there is one having all the letters needed for the next word.
    3. If yes, it adds it to all existing classes where it fits. It is important to add them not to one class but to all fitting classes.
    4. If not, it creates a new one.
    5. For this, it is important that long words are processed before short words.
    6. Whether it fits or not, is done by removing each letter of the new word from the letters of the class. If this is possible for all letters of the new word, then it fits and will be added to the class.
    7. After processing all words, it sorts the classes by number of members descending and prints them in that order.
    It did run in 34 minutes. I'll be grateful for any hints on performance optimization.

    Update: As I realized when reading Re^5: Challenge: 8 Letters, Most Words this solution will not be able to put two four letter words into one class irrespective of the letters. So in some situations it will not find the best solution.