in reply to Not Quite Longest Path Problem

On level 2, the rule is "which ever letter you used to form the first word in level 2 can't be used in any word for the remainder of the level

Clarification please. Do you really mean that only the letter used in the first word can't be used thereafter? Or do you mean "which ever letter you used in any previous step can't be used from then on for the remainder of the level" ?

In the first case the graph wouldn't change after the first step, which would really trivially divide the problem into less than 26 unchanging subproblems. Only in the second case there is constant change in the graph

One refinement for level 2 in the second case would be to pick random pairs of expensive words (without any letter common to the first word of level 2) and look for an expensive path of 4 words between them. For every row of 6 words found this way check if there is a path from the first word of level 2 to either end of this row (since you can turn around the row without invalidating it).

Example:

First word in level 2 is "bean". Now randomly find two words that have neither 'b','e','a' and 'n' in them and no common letter between them. For example "womb" and "pixy". Now look for a string of 4 words between "womb" and "pixy" using no letter from "bean" inbetween. Since 4 of the 5 letters needed to change from womb to pixy are already known ('p','i','x' and 'y'), an exhaustive search of routes between the two words should be very fast.

If you find such a route (which should be very expensive since you could freely select 8 out of the 9 letters), you now would look for a route from 'bean' to either 'womb' or 'pixy' with 3 words inbetween, also rather trivial as the letters to use are all fixed.

Replies are listed 'Best First'.
Re^2: Not Quite Longest Path Problem
by Limbic~Region (Chancellor) on Oct 23, 2009 at 23:34 UTC
    jethro,
    Rather than fail with words, I will try not to fail with an example:
    Level 1 9: moon 10: noon Level 2 11: loon (Can not use 'l' for the rest of level 2)

    Actually, I think it is the letter to make the last word from level 1 that becomes unavailable for level 2 but it really doesn't change the problem. Perhaps I haven't worked with graphs and graph theory enough to understand how continuing to think about this as a graph helps. According to Wikipedia, this is an NP complete problem. Since I indicated I am fine with a suboptimal solution I want to stick to something simple and fast. Do you think this approach would produce solutions consistently better than my drop dead simple approach outline in the root node with a run-time under an hour?

    Cheers - L~R

      Well, this doesn't really answer my questions, since the two cases start to differ from 12 on. In the first case:

      12: loop
      (Can not use 'l' for the rest of level 2)

      In the second case:

      12: loop
      (Can not use 'l' and 'p' for the rest of level 2)

      Anyway: I assume you meant the first case. Then level 2 is just as easy (or difficult) as level 1, the only difference is that after step 11 (or 10) one character is blacklisted

      Whether the algorithm I proposed would give better solutions probably depends on the density of words to non-words. In our case that would be 4000/ 26^4 = 1/114. I.e. only one in 114 four-letter combinations is a real word.

      To find a path from the start word to one of the two words of the 6-word-chain one of the 48 combinations of intermediate words would have to be made out of real words, this happens once every approximately 114^3 /48 = 31000 times. The ratio is even worse because you are looking for expensive words that are probably not as well connected as the average word.

      To be somewhat sure that you get at least one solution the program should at least be able to test 20 times as many word chains in the alloted run-time, lets say 31000 times 20 = 600.000 test per hour or 170 tests a second. So you have to find 170 valid chains of 6 words a second, rather a tall order. I think in this scenario the density is maybe too low for my algorithm to have much success.

        jethro,
        I will be happy to compare any algorithm you can come up with to mine (I am in the process of writing it now). I am providing the word list I am using below (NSFW). You can use Text::Levenshtein to determine the edit distance. I calculated a word's score by (from memory and untested):
        my %freq; while (<$fh>) { chomp; $freq{$_}++ for split //; } my $tot = sum(values %freq); for (keys %freq) { $freq{$_} = 1 - ($freq{$_} / $tot); } sub score { my ($word) = @_; my $score = 0; $score += $freq{$_} for split //, $word; return $score; }
        Since I don't know the rules beyond level 3, assume the rule stays in affect for the rest of the chain. Your starting word should be accepted on the command line. I don't care what you limit the run time to assuming it is less than 4 hours on a reasonable box.

        Cheers - L~R