Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:
You and 7 friends are going to a costume party dressed as letters of the alphabet. Which 8 letters should you choose in order to be able to form the most words.
I asked a few clarifying questions:Your challenge, if you choose to accept it, is to solve the puzzle (provide code, link to dictionary you used, 8 letter solution as well as the number of words that can be made). If you don't have a dictionary, I tend to use the 2of12inf from the Official 12Dicts Package.
Cheers - L~R
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Challenge: 8 Letters, Most Words (Update2:Then again :)
by BrowserUk (Patriarch) on Oct 04, 2013 at 16:05 UTC | |
Update2: Second guessing myself. Update: Amongst other possible errors, I get:
Just brute force (takes 3 or 4 seconds on my 179000 word dictionary): <Reveal this spoiler or all in this thread>
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by Limbic~Region (Chancellor) on Oct 04, 2013 at 19:18 UTC | |
This certainly doesn't look like brute force to me. First you get the words each letter appears in (regardless of how many times it appears). Then, you consider the top 8 letters based on how many words contain the letter but only allow each letter to appear once. Your solution would probably be a lot faster if you also threw out any word from the dictionary that had repeated letter since I can't see how you would ever consider them. Cheers - L~R | [reply] |
by McA (Priest) on Oct 04, 2013 at 19:47 UTC | |
Hi BrowserUK, while I'm looking at the solutions presented here, I tried to understand your solution knowing that you have often a different and interesting view of looking at the problem. I did a litte test case dict:
With this dict the letters to choose to get the most matching words is 'bfffffff'. You algorithm prints something different. What have I missed? Best regards | [reply] [d/l] |
by BrowserUk (Patriarch) on Oct 04, 2013 at 20:00 UTC | |
you have often a different and interesting view of looking at the problem. How many dictionary words contain 7 repeats of the same letter? :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 04:01 UTC | |
by McA (Priest) on Oct 04, 2013 at 20:20 UTC | |
Re: Challenge: 8 Letters, Most Words
by tobyink (Canon) on Oct 04, 2013 at 15:35 UTC | |
OK, this uses some fairly naive heuristics to narrow down the search field. I'm sure somebody can do better, though I'd be fairly surprised if anybody came up with an answer that differed from mine by more than 2 letters.
The dictionary used was the standard /usr/share/dict/words supplied with Ubuntu 13.04. The best set of letters it found was "aeinprst" which could make 426 words. (The list appears to have some duplicates because some words appear in /usr/share/dict/words more than once with different cases.)
use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
| [reply] [d/l] [select] |
by dwhite20899 (Friar) on Oct 04, 2013 at 16:40 UTC | |
Edit: if "p" makes the costume right, they can also be "b", "d" or "q". As far as I can tell, this doesn't violate the rules above. | [reply] |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 04:17 UTC | |
[reply] | |
Re: Challenge: 8 Letters, Most Words
by LanX (Saint) on Oct 04, 2013 at 17:38 UTC | |
Some theory: For a minute I thought this can be trivially solved by counting the normalization of all words (<=8) from the dictionary in a hash... e.g.
As next step successively the count from all smaller words had to be added to covering words, e.g striking the "n" from "ceelnort" leads to "ceelort", so $count{ceelnort}+=$count{ceelort} But than I realized that the best covering word from the dictionary is not necessarily the best solution. take this counterexample for 3 out of 4 letters, the number showing the coverage-count
so the word (a,b,d) is the maximum with a count 4, but the set (a,b,c) would cover 5 words!!! (yes this also works with repeated letters) IMHO this problem belongs to the family of Maximum coverage problem and Set_cover_problem, so finding a guarantied best solution shouldn't be trivial w/o brute force. OTOH adapting the sort order of letters might already lead to very good solutions...
Cheers Rolf ( addicted to the Perl Programming Language)
updateMaybe you can use the above count hash to solve the dual problem: "which of the n-8 letters cover the minimum of words" (n including repetition) E.g. "d" is in only one out of 6 words with 4 letters => the remaining 3 letters cover 5 words. "c" is only in 2 remaining words => (a,b) cover a maximum of 3 words and so on. Not sure if this leads to the guarantied best solution, sounds to easy... =) | [reply] [d/l] [select] |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 03:31 UTC | |
[reply] | |
Re: Challenge: 8 Letters, Most Words
by Limbic~Region (Chancellor) on Oct 04, 2013 at 18:13 UTC | |
I have an idea for an exhaustive solution. I haven't convinced myself it will work yet but it will require multiple phases and at least one of the phases will need to be written in C. I will update when I can play even if it turns out not to be viable. Update: Ok, I finally have something running that appears to work though I won't know for some time if it is correct. I ended up using 2 pieces of code. A short perl script and a C program that I will share despite the fact it is probably horrible. Perl script. Used to filter the dictionary as well as determine the maximum number of times each letter is repeated in a word. Read more... (819 Bytes)
The C program. It uses a high water mark algorithm. To ensure it was working, I had it output every time the high water mark was reached or exceeded rather than waiting all the way until the end. Read more... (6 kB)
I am sure there is a much better way of doing this. I will need to achieve over 5 million loops per second to achieve my goal of being completed within 24 hours. I kicked it off at 10:28PM EST. I will update again when it is completed. Update 2: It only took 12 minutes to run which worries me, but here is the output (winner at the bottom): Read more... (1021 Bytes)
Cheers - L~R | [reply] [d/l] [select] |
Re: Challenge: 8 Letters, Most Words
by CountZero (Bishop) on Oct 04, 2013 at 20:06 UTC | |
Output: Corrected the "off-by-one" error. Re-corrected the off-by-one error which was due to an empty line in the small test dictionaries I used. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics | [reply] [d/l] [select] |
by McA (Priest) on Oct 04, 2013 at 20:34 UTC | |
Hi CountZero, I'm pretty sure you have the wrong driver. Look at this dictionary:
The solution is clearly something with 6 x 'a'. Your program states something different. Best regards | [reply] [d/l] |
by CountZero (Bishop) on Oct 04, 2013 at 20:52 UTC | |
So my solution answers the question with one extra condition: that the 8 characters together form a word too. And I seem to have an "off by one error" somewhere. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics | [reply] |
by choroba (Cardinal) on Oct 04, 2013 at 21:57 UTC | |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 03:42 UTC | |
| |
by Limbic~Region (Chancellor) on Oct 04, 2013 at 21:11 UTC | |
by CountZero (Bishop) on Oct 05, 2013 at 08:27 UTC | |
| |
by McA (Priest) on Oct 04, 2013 at 21:06 UTC | |
by BrowserUk (Patriarch) on Oct 04, 2013 at 22:14 UTC | |
| |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 03:16 UTC | |
What was the "off-by-one" error? Using the same dictionary file, I get aeinprst = 346 which I have manually verified. This isn't brute force (as you have indicated elsewhere) - it is heuristic but appears to come up with the correct result. While I like it, I needed to know for certain which is why I wrote an exhaustive solution which amazingly finished in 12 minutes. Cheers - L~R | [reply] |
by CountZero (Bishop) on Oct 05, 2013 at 14:08 UTC | |
It is quite funny that my brute force code finds the right solution even without going through all possible combinations. I guess I was just lucky. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics | [reply] |
Re: Challenge: 8 Letters, Most Words
by aaron_baugher (Curate) on Oct 05, 2013 at 03:41 UTC | |
I tried coming at this from another direction. Instead of looping through the possible letter combinations, I started with the word list and an empty array of "stacks" of words that could fit into the same set of 8 letters. For each word in the list, I checked it against each "stack," adding it to any stacks it could fit into, and starting a new stack of its own. The first element in each stack is a list of the letters required to make all the words in that stack. So I'm able to compare against this one "word," rather than all the words in the stack each time. I could save some memory, and maybe some CPU, by not saving the words at all, just the number that fit in the stack, but I wanted to keep track of the words so I could verify that it's working right. For each stack (sub-array), I do a quick check to see if the new word added to this stack would obviously require more than 8 letters, so it can bail out quickly if that's the case. Otherwise it does a more complex check to see what set of letters would be required to add this word to the stack, and does so if it fits. In the end, it picks the largest stack. I ran it against the file 2of12.txt, skipping hyphenated words, lowering the case on everything, and removing duplicates, and it took almost exactly one hour on my system to process the remaining 21664 2-8 letter words. I'm rerunning it now against 2of12inf.txt, which I expect will take at least four times as long, maybe more. In the meantime, here's my code and results. I think there are probably some pretty big inefficiencies here, but I hope the method is interesting, at least.
Update: Running my script on the larger 2of12inf.txt had the following results in 3.5 hours:
Aaron B. | [reply] [d/l] [select] |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 12:45 UTC | |
I considered a bucket approach as well - but not quite the way you did it. What I was going to do was bucket all words of the same length and then at the end roll up 2 letter words into 3 letter words, 3 letter words into 4 letter words, etc. I realized this won't guarantee the best solution because of what I have explained elsewhere in this thread (taking some from one group and some from another can lead to the best solution) I ran your code against 2of12inf.txt (actually, the file I had already reduced). On my machine it took 2 hours and 3 minutes. Here is the output: The correct answer has 346 words. Cheers - L~R | [reply] [d/l] |
by aaron_baugher (Curate) on Oct 05, 2013 at 13:45 UTC | |
Yes, I had it dump the complete array to a file the last time, and the winning set "painters" only gets 292 words from my script. I had a feeling it could miss some somehow, but I can't quite figure out how yet. Update: I see the problem now. Let's say "painters" is the 100th word, and "at" is word #2 and "not" is word #3. "not" gets added to bucket #2 along with "at" because it can fit, but now "painters" can't go into that bucket. And since "at" has already been placed in bucket #2 and possibly bucket #1, it can't be placed in bucket #100 with "painters" or in any of the buckets in between that "painters" might be in. So to make my approach work, I guess once all the words have been placed in at least one bucket, I would have to go back through them again, retrying them against each bucket past the one they started in. And even then I'm not sure you'd be guaranteed to get a bucket with the best possible combination. I had a nagging feeling there was a problem like that, but I needed a night's sleep to see it. Aaron B. | [reply] |
by hdb (Monsignor) on Oct 05, 2013 at 14:48 UTC | |
by aaron_baugher (Curate) on Oct 05, 2013 at 15:38 UTC | |
| |
Re: Challenge: 8 Letters, Most Words
by hdb (Monsignor) on Oct 05, 2013 at 05:09 UTC | |
Another brute force method that puts all words in 2of12inf.txt in 8 letter classes and counts their members. Runs 40mins on my machine.
and here are my top ten:
| [reply] [d/l] [select] |
by hdb (Monsignor) on Oct 05, 2013 at 14:11 UTC | |
Cleaned up the code a bit:
So this is how it works:
Update: As I realized when reading Re^5: Challenge: 8 Letters, Most Words this solution will not be able to put two four letter words into one class irrespective of the letters. So in some situations it will not find the best solution. | [reply] [d/l] |
Re: Challenge: 8 Letters, Most Words (top 50)
by BrowserUk (Patriarch) on Oct 04, 2013 at 19:04 UTC | |
Using a slightly different approach, this is the top 50 from my dictionary:
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by Limbic~Region (Chancellor) on Oct 04, 2013 at 19:23 UTC | |
[reply] | |
by BrowserUk (Patriarch) on Oct 04, 2013 at 22:45 UTC | |
Hm. L~R: I've sat on these results for an hour now (it takes about 2 hours to run) whilst I looked for errors. I haven't seen anything obvious. But, you should probably take them with a pinch of salt. I see the possibility for something wrong in these results, but it is too late, and takes too long to verify them tonight. I'll attempt verification tomorrow. However, the results are interesting enough that I felt you might like to see them. The following two sets of results are from my (179k) dictionary and the (81k) 2of12int.txt. These results do allow for set of 8 letters than include duplicates, though none are in the top 50. What is intriguing (and worrying) is that word counts are identical; but the charsets are not!
I am convinced that the top result from both datasets is "the right answer"; despite the uncomfortable feeling from the concordance of the numbers and the discordance of the charsets. Tomorrow... With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 02:43 UTC | |
by BrowserUk (Patriarch) on Oct 05, 2013 at 10:51 UTC | |
| |
Re: Challenge: 8 Letters, Most Words
by TJPride (Pilgrim) on Oct 04, 2013 at 20:28 UTC | |
Output: This should be 100% accurate with any given word list (of a-z only), though it does take a little while to run, and if anyone can think of a way to speed it up without compromising accuracy, by all means do so. | [reply] [d/l] [select] |
by Limbic~Region (Chancellor) on Oct 04, 2013 at 20:55 UTC | |
You filtered things out that shouldn't have been (capitalization/punctuation don't matter for instance). Here is the code that I am using to filter my list:
This should be 100% accurate with any given word list I will be using the same dictionary with my brute force solution so I will let you know though I won't be filtering out all the words you have. Cheers - L~R | [reply] [d/l] |
by TJPride (Pilgrim) on Oct 04, 2013 at 22:09 UTC | |
What do you mean by brute force, exactly? There are far too many possible letter combinations to just check each one. | [reply] |
by Limbic~Region (Chancellor) on Oct 05, 2013 at 00:17 UTC | |
by LanX (Saint) on Oct 05, 2013 at 00:39 UTC | |
| |
by choroba (Cardinal) on Oct 05, 2013 at 00:31 UTC | |
| |
by CountZero (Bishop) on Oct 04, 2013 at 21:08 UTC | |
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics | [reply] |
by McA (Priest) on Oct 04, 2013 at 20:50 UTC | |
Hi, you print a list, but what is the 8 character sequence which solves the problem? Is it the first one? McA | [reply] |
by TJPride (Pilgrim) on Oct 04, 2013 at 22:09 UTC | |
| [reply] |
by McA (Priest) on Oct 04, 2013 at 23:40 UTC | |
by TJPride (Pilgrim) on Oct 30, 2013 at 16:10 UTC | |
Re: Challenge: 8 Letters, Most Words
by kcott (Archbishop) on Oct 05, 2013 at 14:47 UTC | |
G'day Limbic~Region, Here's my take on a solution. It uses the 2of12inf.txt dictionary. It takes about a minute to run on my OS. [as always, YMMV] It doesn't use any particularly noticeable amounts of memory.
Output:
I also wrote a complementary script to check the results (this only takes a second or two to run):
Output:
-- Ken | [reply] [d/l] [select] |
Re: Challenge: 8 Letters, Most Words
by Tux (Canon) on Oct 05, 2013 at 08:51 UTC | |
I get 670 with aeilnrst using /usr/share/dict/words and 346 with aeinprst using 2of12inf, but I'd like to expand on this. All presented solutions take as a base that the letters should be able to create at least one 8-letter word. If I however assume that a 7-letter word with a random extra letter is able to create more words with shorter words, the number of possibilities explodes. Read more... (2 kB)
Read more... (2 kB)
Read more... (2 kB)
And here is my code: Read more... (3 kB)
Enjoy, Have FUN! H.Merijn | [reply] [d/l] [select] |
by tobyink (Canon) on Oct 05, 2013 at 20:33 UTC | |
Mine doesn't.
use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
| [reply] |
Re: Challenge: 8 Letters, Most Words
by davido (Cardinal) on Oct 07, 2013 at 20:41 UTC | |
I believe that this will always produce a correct result. So here is a solution that gets it right in under 1 minute 40 seconds, using SQLite: Update: Here's a newer version that has better reporting: Read more... (9 kB)
The result I get with $ time ./eightletters.pl is:
The logic works as follows:
In the database, the letter columns are in reverse-frequency order, or frequency-ascending order. That way, 'q' is the first letter column, and so on, finishing at 'e'. The queries also use this reverse-frequency order, so that the "AND"s within the "WHERE" clause can narrow down the list of records to include as early as possible. ...at least that's my theory. ;) The code is embarrassingly dirty, and there are several significant opportunities to further trim cycles. But I wanted to put the ideas out there in case someone else wanted to consider the methodology. Dave | [reply] [d/l] [select] |
Re: Challenge: 8 Letters, Most Words
by McA (Priest) on Oct 13, 2013 at 19:03 UTC | |
Hi L~R, hi all other interested monks. It's done. My program presented at Re^10: Challenge: 8 Letters, Most Words ended. My estimation was not too bad, as you can see:
This is worth of 7 days, 20 hours, 50 minutes, 25.517 seconds. IMHO THIS is a really (stupid) brute force solution. ;-) So, now to the result: I found two winners:
With both the program was able to build 337 words out of the dictionary. Here you can find the dictionary I worked on and the detailed result: http://pastebin.com/jU5cbzb8 Probably I will investigate the differencet to the solution of L~R. I'm sure it would be faster if he would run his script/program on my dictionary. :-) Best regards P.S.: If anybody is saying to you the well known mantra: Code first, optimize later! show him this node and answer: Time, resources and runtime are requirements, even when secondary ones. ;-) | [reply] [d/l] [select] |
Re: Challenge: 8 Letters, Most Words
by oiskuu (Hermit) on Oct 08, 2013 at 15:28 UTC | |
Nice challenge. I went for a particular brute force approach that requires 64bit perl with threads.
Filtering 2of12inf.txt gives 40933 words, 35893 of which incongruent.
Anyway, this is what I wrote: Read more... (2 kB)
| [reply] [d/l] [select] |
by oiskuu (Hermit) on Oct 09, 2013 at 14:33 UTC | |
Ok, I tried to filter the word set before going into loops. This appears to reduce the search space by a factor of 30.
Update2: added coverage check Read more... (2 kB) | [reply] [d/l] [select] |
Re: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
by BrowserUk (Patriarch) on Oct 09, 2013 at 06:25 UTC | |
Exhaustive search should work for any dictionary. (The 10 secs doesn't include the sort, but it could use a high watermark.):
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by hdb (Monsignor) on Oct 09, 2013 at 09:40 UTC | |
It is quite impressive. I have not yet understood how it works. It seems not to work on the following dictionary:
where one solution is eehnortw covering four words but your script says [1] aceesttx : exactest. Update: After further study it looks to me, that you are checking all 8 letter classes derived from the 8 letter words in the dictionary and see how many words are captured. Which is probably giving you the correct solution for many real world dictionaries. This way, you do avoid the combinatorial explosion that makes this challenge difficult... Still quite impressive! | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Oct 09, 2013 at 16:37 UTC | |
It does recurse through all 8-letter combos, but short-circuits at each level if the combination so far cannot be derived from the dictionary supplied.
If the letter combination for your 'eehnortw' existed in the dictionary, it would be found very quickly:
Comment out the short-circuit and it will consider all 8-letter possibilities .. and run more slowly. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Oct 09, 2013 at 18:47 UTC | |
There are 25 letter sets that will cover 4 words of your sample dictionary:
375 seconds isn't as impressive, but better than 24 hours :) This makes use of another optimisation that only benefits when the dictionary is small like yours:
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by LanX (Saint) on Oct 09, 2013 at 12:20 UTC | |
Cheers Rolf ( addicted to the Perl Programming Language) | [reply] |
by hdb (Monsignor) on Oct 09, 2013 at 14:03 UTC | |
by LanX (Saint) on Oct 09, 2013 at 14:13 UTC | |
Re: Challenge: 8 Letters, Most Words
by duelafn (Parson) on Oct 11, 2013 at 12:40 UTC | |
Simple greedy algorithm takes only a 20-45 seconds to find some of the top contenders. Does better on my large wordlist than the smaller 2of12inf.txt. Didn't bother to extend to examine more branches or perform exhaustive search on last 2 or 3 letters since it did so well.
The Code: Read more... (4 kB)
Good Day, | [reply] [d/l] [select] |
Re: Challenge: 8 Letters, Most Words
by repellent (Priest) on Oct 31, 2013 at 18:06 UTC | |
My solution: <Reveal this spoiler or all in this thread>
My strategy is to aggregate up the word counts from candidates with the fewest letters to the most letters. (This approach is not thorough but is fast, see Update #2). The trick to make it fast is to realize that there is a one letter difference between candidate tiers, and to cycle through 26 letters for subset matching. My winner is aeilprst with it being able to make at least 343 words (using all 8 letters, or a subset of that). This may not be the best answer because of the simplistic aggregation, but it points to the likely champs. Update: I see what's wrong now. The aggregation has to keep track of contributing counts of specific candidates. Back to drawing board. Update #2: I have updated the script and results, which is more in line with what others have gotten. The total word counts do come up short because of the aggregation strategy. But the tradeoff in speed is significant. | [reply] [d/l] [select] |