in reply to Re: Not Quite Longest Path Problem
in thread Not Quite Longest Path Problem

jethro,
Rather than fail with words, I will try not to fail with an example:
Level 1 9: moon 10: noon Level 2 11: loon (Can not use 'l' for the rest of level 2)

Actually, I think it is the letter to make the last word from level 1 that becomes unavailable for level 2 but it really doesn't change the problem. Perhaps I haven't worked with graphs and graph theory enough to understand how continuing to think about this as a graph helps. According to Wikipedia, this is an NP complete problem. Since I indicated I am fine with a suboptimal solution I want to stick to something simple and fast. Do you think this approach would produce solutions consistently better than my drop dead simple approach outline in the root node with a run-time under an hour?

Cheers - L~R

Replies are listed 'Best First'.
Re^3: Not Quite Longest Path Problem
by jethro (Monsignor) on Oct 24, 2009 at 00:57 UTC

    Well, this doesn't really answer my questions, since the two cases start to differ from 12 on. In the first case:

    12: loop
    (Can not use 'l' for the rest of level 2)

    In the second case:

    12: loop
    (Can not use 'l' and 'p' for the rest of level 2)

    Anyway: I assume you meant the first case. Then level 2 is just as easy (or difficult) as level 1, the only difference is that after step 11 (or 10) one character is blacklisted

    Whether the algorithm I proposed would give better solutions probably depends on the density of words to non-words. In our case that would be 4000/ 26^4 = 1/114. I.e. only one in 114 four-letter combinations is a real word.

    To find a path from the start word to one of the two words of the 6-word-chain one of the 48 combinations of intermediate words would have to be made out of real words, this happens once every approximately 114^3 /48 = 31000 times. The ratio is even worse because you are looking for expensive words that are probably not as well connected as the average word.

    To be somewhat sure that you get at least one solution the program should at least be able to test 20 times as many word chains in the alloted run-time, lets say 31000 times 20 = 600.000 test per hour or 170 tests a second. So you have to find 170 valid chains of 6 words a second, rather a tall order. I think in this scenario the density is maybe too low for my algorithm to have much success.

      jethro,
      I will be happy to compare any algorithm you can come up with to mine (I am in the process of writing it now). I am providing the word list I am using below (NSFW). You can use Text::Levenshtein to determine the edit distance. I calculated a word's score by (from memory and untested):
      my %freq; while (<$fh>) { chomp; $freq{$_}++ for split //; } my $tot = sum(values %freq); for (keys %freq) { $freq{$_} = 1 - ($freq{$_} / $tot); } sub score { my ($word) = @_; my $score = 0; $score += $freq{$_} for split //, $word; return $score; }
      Since I don't know the rules beyond level 3, assume the rule stays in affect for the rest of the chain. Your starting word should be accepted on the command line. I don't care what you limit the run time to assuming it is less than 4 hours on a reasonable box.

      Cheers - L~R

        As I said, my algorithm is probably not that good in this szenario

        It could be made somewhat feasible if I pregenerated a database of all the shortest paths between any 2 words (together with all letters used and the points it awards). That would be 4000x4000 = 16 million entries

        But with all the testing and optimizing that would be the work of a few days, which I don't have at the moment. Sorry

        Given this scoring mechanism, I would focus more on path length than scoring, as the scores between words doesn't differ wildly. The highest scoring word I found was fuzz, with a score of 3.92574, while the lowest scoring word was ease, scoring 3.62655. 10 times the 'cheapest' word still pays more than 9 times the most 'expensive' word.