in reply to The Threeve Game

pileofrogs,
This really isn't on topic, so let me suggest turning it into a challenge that is.

Assume we have a list of multi-token words. In fact, assume that it is obtained using the following code:

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(autocheck => 1); for ('a' .. 'z') { my $url = "http://wordlist.com/index-$_.htm"; eval { $mech->get($url) }; if ($@) { warn "Unable to get link for letter '$_': $@\n"; next; } for my $link ($mech->links) { my $word = $link->text; next if ! $word || index($word, ' ') == -1; print "$word\n"; } }

The challenge is to make the longest chain of multi-token words where the end of one word overlaps the beginning of the next word by at least one token. Each word may only be used once in the chain. Here is an example:

area code code of ethics ethics committee

Here are some (likely not all) edge cases that I thought of.

I realize that this particular list stinks but I couldn't find a better one. If you use an alternate source, please link to it so that others may compete using the same list. Oh, I am pretty sure there is a fairly well known computer science problem hidden within so heuristics solutions are likely necessary.

It should be fairly obvious, but here is a hint if you are having a hard time dealing with such a large list:

Cheers - L~R

Replies are listed 'Best First'.
Re: Multi-token word chains (was The Threeve Game)
by planetscape (Chancellor) on Feb 10, 2010 at 02:12 UTC
    I realize that this particular list stinks but I couldn't find a better one. If you use an alternate source, please link to it

    In the past I've found googling for compound words to be helpful, particularly in turning up puzzle-related sites, such as:

    HTH,

    planetscape
Re: Multi-token word chains (was The Threeve Game)
by blokhead (Monsignor) on Feb 11, 2010 at 00:53 UTC
    FYI: The problem of finding the longest "chain" is NP-complete, even in the case of just 2-word phrases, as can be seen from a reduction from the longest path problem in directed graphs.

    Each single word is a vertex, and there is an edge from word A to word B if "A B" is a valid 2-word phrase. A path in the graph corresponds to a "chain" of valid 2-word phrases in this game, and we want to find the longest one.

    blokhead

      blokhead,
      FYI: The problem of finding the longest "chain" is NP-complete

      Yes, I know which is why I said "Oh, I am pretty sure there is a fairly well known computer science problem hidden within so heuristics solutions are likely necessary." See for instance, Not Quite Longest Path Problem. I am really not interested in someone finding the longest path but someone who can find a long path that is longer than everyone else and examine their heuristic solution.

      Cheers - L~R

Re: Multi-token word chains (was The Threeve Game)
by MidLifeXis (Monsignor) on Feb 10, 2010 at 14:40 UTC

    Add one more level to this by creating the largest loop that you can.

    The basic framework above, but also have the last token of the last set overlap with the first token of the first set.

    Not thinking of any right off the bat. Perhaps after my coffee I will update this node. Oh wait - "implementation is left as an exercise for the reader." ;-)

    It is said that "only perl can parse Perl." I don't even come close until my 3rd cup of coffee. --MidLifeXis