Re: Add-A-Gram Performance

Well I redid some of the code with all of your wonderful suggestions, especially grep and chipmunk, but now I have run into a snag. Rather then using %wordpl where the items are arrays of same length words and the keys are the lengths of those words, %wordpl's items are another hash with the sorted words as the keys and no items, and the keys are the lengths of the different sorted words. %wordr is away to reverse look up the actual words (ie the key is the same as the sorted letter, but point to an array of all the anagrams of those letters. But when i run this program it spits out:

 # Use of uninitialized value, <DICT> chunk 41.
File 'untitled:Desktop Folder:Will's Stuff:Applications:MacPerl ƒ:add-
+a-gram redux'; Line 15
# Use of uninitialized value, <DICT> chunk 41.
File 'untitled:Desktop Folder:Will's Stuff:Applications:MacPerl ƒ:add-
+a-gram redux'; Line 16
[download]

Line 15 and 16 assign %wordpl and %wordr, but i dont have a clue what is wrong. Here is the full code:

#!/usr/bin/perl -w
use strict;
my ($dict, $adda, $line, @words, $word, %wordpl, $k, @lengths, $l, @ad
+dw, $oword, $lo, $i, %wordr);
$dict="<untitled:Desktop Folder:WORD.LST";

####
#Open the dictionary file and Create a hash (%wordpl) of a hash,
#where each key in 2nd hash is the same length, and the keys for
#the 1st hash represent those lengths
####

open (DICT, $dict) or die ("Can't open $dict: $!\n");
while (<DICT>) {
    chomp;
    push @{ $wordr{sort {lc($a) cmp lc($b)} $_} }, $_;
    $wordpl{length()}{sort {lc($a) cmp lc($b)} $_}++;
}
close (DICT);

####
#Read the keys of %wordpl, i.e. the lengths of the elements in the 2nd
+ hash
#in descending order Start a loop where $_ is the length, or the key,
#for %wordpl.
####

foreach (sort { $b <=> $a } keys %wordpl) {
    push (@addw, word(%wordpl, $_, $oword, @addw));
}

####
#Start a loop where you pick the a word in keys %{ $wordpl{$l} }
#and pass that word and $l-1 (index of hash of words with one
#less character then aforementioned word) to word1()
####

sub word {
    my ($word, $l, %wordpl, $oword, $p, @addw, $oword);
    %wordpl=$_[0];
    $l=$_[1];
    $oword=$_[2];
    @addw=$_[3];
    foreach $word ( keys %{ $wordpl{$l} } ) {
        $oword=$word; #define original word
        my $p=$l-1;
        word1($p, $word, %wordpl, $l, $oword, @addw);
    }
    return @addw;
}

####
#Split $word from sub word into an @a, do the same for the word
#picked by WO. If the two words differ by one character, and the
#newest word ($sword) is greater then 3 characters, pass $sword
#and $p-1 (index of hash of words with one less character then
#aforementioned word) to word2(). When word2() exits, exit WO
#and return to word(). If the newest word is equal to three 
#characters in length, push that word into the array of 
#add-a-grams and exit WO to word() to restart process
####


sub word1 {
    my ($p, @a, %wordpl, $l, $sword, @b, %seen, @aonly, $item, @addw, 
+$oword);
    $p=$_[0];
    @a=split(//,$_[1]);
    %wordpl=$_[2];
    $l=$_[3];
    $oword=$_[4];
    @addw=$_[5];
    WO: foreach $sword ( keys %{ $wordpl{$p} } ) {
        @b=split(//,$sword);
        %seen = ();
        @aonly = ();
        @seen{@b}=();
        foreach $item (@a) {
            push (@aonly, $item) unless exists $seen{$item};
        }
        if (length(@aonly)==1) {
            if ($p==3) {
                push @addw, $oword;
                last WO;
            }
            $p--;
            word2($p, $sword, %wordpl, $l, $oword, @addw);
            last WO;
        }
    }
}


####
#Split $sword from sub word1 into an @a, do the same for the word
#picked by WT. If the two words differ by one character, and the
#newest word ($sword) is greater then 3 characters, pass $sword
#and $p-1 (index of hash of words with one less character then
#aforementioned word) to word1(). When word1() exits, exit WT
#and return to word(). If the newest word is equal to three 
#characters in length, push that word into the array of 
#add-a-grams and exit WT to word() to restart process
####

sub word2 {
    my ($p, @a, %wordpl, $l, $sword, @b, %seen, @aonly, $item, @addw, 
+$oword);
    $p=$_[0];
    @a=split(//,$_[1]);
    %wordpl=$_[2];
    $l=$_[3];
    $oword=$_[4];
    @addw=$_[5];
    WT: foreach $sword ( keys %{ $wordpl{$p} } ) {
        @b=split(//,$sword);
        %seen = ();
        @aonly = ();
        @seen{@b}=();
        foreach $item (@a) {
            push (@aonly, $item) unless exists $seen{$item};
        }
        if (length(@aonly)==1) {
            if ($p==3) {
                push @addw, $oword;
                last WT;
            }
            $p--;
            word2($p, $sword, %wordpl, $l, $oword, @addw);
            last WT;
        }
    }
}

####
#print all the anagrams
####

foreach $i (@addw) {
    foreach $lo (@{ $wordr{$i} }) {
        print "$lo\n";
    }
}
[download]

Comment on Re: Add-A-Gram Performance Select or Download Code

Replies are listed 'Best First'.
Re: Re: Add-A-Gram Performance by runrig (Abbot) on Feb 06, 2002 at 00:59 UTC
One minor thing. The word file on the website contains NO uppercase letters nor any characters besides a-z (as a 'grep "^a-z"' of the file shows). So the lc is useless in this particular instance.	[reply]
Re: Re: Re: Add-A-Gram Performance by smgfc (Monk) on Feb 06, 2002 at 02:12 UTC
Thanks alot. I actually didnt realize that, i just found that particular piece of code in Programming Perl, and didnt consider the dictionary file. I was about to take it out when i thought i would leave it, just incase I decided to use a different dictionary, etc. etc. So unless I am wrong in thinking lc is pretty effiecient I think I will leave it. Then again there are ~170,000 words in that dictionary file, so 680000 lc calls might slow it down a little! :) Thanks again!	[reply]