in reply to Re^2: Not Quite Longest Path Problem
in thread Not Quite Longest Path Problem

Well, this doesn't really answer my questions, since the two cases start to differ from 12 on. In the first case:

12: loop
(Can not use 'l' for the rest of level 2)

In the second case:

12: loop
(Can not use 'l' and 'p' for the rest of level 2)

Anyway: I assume you meant the first case. Then level 2 is just as easy (or difficult) as level 1, the only difference is that after step 11 (or 10) one character is blacklisted

Whether the algorithm I proposed would give better solutions probably depends on the density of words to non-words. In our case that would be 4000/ 26^4 = 1/114. I.e. only one in 114 four-letter combinations is a real word.

To find a path from the start word to one of the two words of the 6-word-chain one of the 48 combinations of intermediate words would have to be made out of real words, this happens once every approximately 114^3 /48 = 31000 times. The ratio is even worse because you are looking for expensive words that are probably not as well connected as the average word.

To be somewhat sure that you get at least one solution the program should at least be able to test 20 times as many word chains in the alloted run-time, lets say 31000 times 20 = 600.000 test per hour or 170 tests a second. So you have to find 170 valid chains of 6 words a second, rather a tall order. I think in this scenario the density is maybe too low for my algorithm to have much success.

Replies are listed 'Best First'.
Re^4: Not Quite Longest Path Problem
by Limbic~Region (Chancellor) on Oct 24, 2009 at 01:11 UTC
    jethro,
    I will be happy to compare any algorithm you can come up with to mine (I am in the process of writing it now). I am providing the word list I am using below (NSFW). You can use Text::Levenshtein to determine the edit distance. I calculated a word's score by (from memory and untested):
    my %freq; while (<$fh>) { chomp; $freq{$_}++ for split //; } my $tot = sum(values %freq); for (keys %freq) { $freq{$_} = 1 - ($freq{$_} / $tot); } sub score { my ($word) = @_; my $score = 0; $score += $freq{$_} for split //, $word; return $score; }
    Since I don't know the rules beyond level 3, assume the rule stays in affect for the rest of the chain. Your starting word should be accepted on the command line. I don't care what you limit the run time to assuming it is less than 4 hours on a reasonable box.

    Cheers - L~R

      As I said, my algorithm is probably not that good in this szenario

      It could be made somewhat feasible if I pregenerated a database of all the shortest paths between any 2 words (together with all letters used and the points it awards). That would be 4000x4000 = 16 million entries

      But with all the testing and optimizing that would be the work of a few days, which I don't have at the moment. Sorry

      Given this scoring mechanism, I would focus more on path length than scoring, as the scores between words doesn't differ wildly. The highest scoring word I found was fuzz, with a score of 3.92574, while the lowest scoring word was ease, scoring 3.62655. 10 times the 'cheapest' word still pays more than 9 times the most 'expensive' word.
        JavaFan,
        Now that I am the all-time FaceBook high score holder, I can tell you a little bit more about the game. I left a lot out and got 1 thing wrong in my original post.

        First, there a multiplier bonuses for changing a different position 4 times in a row consecutively. So if you were able to change position 3, then 1, then 4 and then 2, your multiplier goes from 1 to 2 (words are now worth twice their score). You can continue to increase your multiplier until you make a mistake (choose an invalid word or one you have already used). I have found that the Scrabble TWL is not the word list used by this application because occasionally, I get a word wrong.

        Second, every single level introduces a new rule so I really think using any graph theory is futile. At least the way I am thinking of it. Here is what I have been able to decipher so far:

        1. No restrictions
        2. Can't use the last letter used*
        3. Can't use the last position used
        4. Can't use the last letter or last position used
        5. No restrictions
        6. Can't use the last letter used on 1st word, then last two letters for remainder of level
        7. First word: can't use last position or letter from 2 words ago. Rest: can't use last 2 positions
        I could go on, but it just gets more complicated from there.

        Cheers - L~R

        Third, it does seem as you pointed out that it is more important to find longer changes of cheaper words because of bonuses then it is to find shorter chains of expensive words. The scoring by the way was made up by me - it is obvious they use letter frequency to determine the value of a letter but I didn't bother to record every one.

        * Elsewhere in this thread I indicate it is just the first letter used for the remainder of the level. In reality, it changes from word to word to the last letter used.