Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Pangram challenge: greed and Scrabble

by Athanasius (Archbishop)
on Apr 24, 2022 at 14:43 UTC ( [id://11143246]=perlmeditation: print w/replies, xml ) Need Help??

Task 2 of the current Perl Weekly Challenge is to generate a pangram from the English dictionary provided. “A pangram or holoalphabetic sentence is a sentence using every letter of a given alphabet at least once.”1 However, for this challenge the requirement that the words form a sentence is removed: a list of unrelated words is all that is required.

The bonus challenge is to “Constrain or optimize for something interesting,” and the first suggestion given is:

Shortest possible pangram (difficult)

Ok, so just how difficult would it be to perform an exhaustive search? I don’t know the answer, but since the search space increases exponentially as words are added to the dictionary, a search for sequences (of varying lengths) drawn from a pool of 23,610 words (see below) would appear highly impractical. I haven’t attempted it.

Before we proceed, it should be noted that the dictionary can be significantly reduced in size before the search begins. Where groups of words share the same set of letters, only one word from the group (the shortest) need be kept. For example, “abdicate,” “diabetic,” and “abdicated” all have the letter set {a, b, c, d, e, i, t}, so “abdicate” is retained but “diabetic” and “abdicated” are filtered out. This reduces the dictionary size from 39,172 to 23,610 — a saving of 15,562 words!

As noted above, 23,610 is still far too many words for an exhaustive search. So we’re looking for a way to get a close-to-optimum result in a reasonable time. And as we’ve just noted, each dictionary word can be considered as a set of its component letters. So the task is to find a minimum set cover, a well-studied problem in combinatorics and computer science.2 And there is an algorithm that is known to give good results in polynomial time: the greedy algorithm:

at each stage, choose the set that contains the largest number of uncovered elements.

My initial implementation of this algorithm produced the following pangram:

oversimplification ladybug hawk jazz equinox (40 letters)

I wasn’t satisfied with this, so began looking for ways to improve the search. It occurred to me that words with rare letters should be preferred over words with common letters. This immediately suggested the Scrabble3 points system:

const my %ALPHABET => ( a => 1, b => 3, c => 3, d => 2, e => 1, f => 4, g => 2, h => 4, i => 1, j => 8, k => 5, l => 1, m => 3, n => 1, o => 1, p => 3, q => 10, r => 1, s => 1, t => 1, u => 1, v => 4, w => 4, x => 8, y => 4, z => 10 );

With this I was able to give each word a weighting based on its component letters, and prefer words with higher weightings. This produced the following pangram:

sympathized quarterbacking reflexive jawbone (41 letters)

— no improvement at all! The problem appears to be that the selected words contain too much deadweight — too many redundant letters. So would it help to adjust the weighting for each word by deducting points for redundant letters? I experimented with different values of $DUP_WEIGHT, the amount to deduct for each redundant letter. Here are the results:

$DUP_WEIGHT Pangram Letter Count
1 sympathized quarterbacking jinx overflow 37
2 sympathized quacking overflow jinx be 33
3 sympathized quacking fox jaw verb all 32
4 sympathized unblock xv jaw qt frog 29

Eureka! At 29 letters, the last pangram is only 3 letters above the theoretical best-possible result of 26. (For values of $DUP_WEIGHT above 4, the result does not change.)

It should be emphasized that my improved results are not an improvement on the greedy algorithm. That algorithm applies to sets, whereas my search for pangrams is a search over words, which are essentially bags because they contain multiple instances of the same letters.

For the implementation, I found the Set::Tiny module very useful. I highly recommend it for working with sets of strings.

Wikipedia1 gives the following as a short pangram:

Waltz, bad nymph, for quick jigs vex. (28 letters)

Since all these words occur in the given dictionary, it would be possible for a search to improve on my best solution by at least one letter. Challenge: can anyone implement a search which finds this pangram (or a shorter one) within a reasonable time?

Full code listings for my solution (in Perl and Raku) will shortly be are available on the Weekly Challenge’s GitHub repository at https://github.com/manwar/perlweeklychallenge-club/tree/master/challenge-161/athanasius.

Cheers,

1“Pangram, ” Wikipedia, https://en.wikipedia.org/wiki/Pangram
2“Set cover problem,” Wikipedia, https://en.wikipedia.org/wiki/Set_cover_problem
3 “Scrabble letter distributions,” Wikipedia, https://en.wikipedia.org/wiki/Scrabble_letter_distributions#English

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re: Pangram challenge: greed and Scrabble
by tybalt89 (Monsignor) on Apr 25, 2022 at 15:15 UTC

    In playing around with various versions of this, I found that most 'good' solutions have both 'qt' and 'xv' in them, so this version starts with that. '@words' contains just the words with no duplicated letters. This code just picks those words from @words that have no previously used letters, and keeps adding words until it can't. Lines with 26 letters are declared WINNER!. The code at the end and the __DATA__ section are just for verifying that previously discovered lines are valid.

    This code is finding a 26 every 5 to 10 minutes, roughly.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11143246 use warnings; use List::AllUtils qw( uniq uniq_by ); my @anagrams = uniq_by {join '', sort /./g} do{ local (@ARGV, $/) = 'dictionary.txt'; <> =~ /^\w+$/gm }; my @words = grep ! /(.).*\1/, @anagrams; print @anagrams . " anagrams and " . @words . " uniletter words\n"; my @best; my $done; $SIG{INT} = sub {$done = 1}; #for my $w ( map { ($_) x 1e4 } grep /^qt$/, grep /q/, @words ) for my $w ( ('qt xv') x 1e7 ) { my @new = @words; my $line = $w; while( 1 ) { my @letters = uniq $line =~ /\w/g; my @new = grep ! /[@letters]/, @new; # ignore if existing letters @new or last; $line .= " " . $new[rand @new]; } my @all = uniq $line =~ /\w/g; @all >= 26 and print "WINNER!!\n"; @all >= 25 and print @all . " $line\n"; @all >= $#best and $best[@all] .= "$line\n"; $done and last; } print "\n\nBEST $#best\n", $best[-1]; print "\n"; for ( <DATA> ) { my @dups = /(\w)(?=.*\1)/g; printf "%-40s length: %2d dups: (%s)\n", s/\n//r, tr/a-z//, "@dups" + ; } # these are found 26 letter sentences from various versions __DATA__ qt jokes zinc flag rpm why bud xv qt xv dumps chalk frenzy jig bow qt xv grandly hump wicks job fez qt xv bled frank cum why zip jogs qt xv zoned brag whys flick jump qt xv cranked jugs bop whiz m fly qt xv spindly chow bark m jug fez qt xv junk child prows bag my fez qt xv furls peck dazing m why job qt xv crumbly ponds hawk fez jig qt xv lynched frog jump bask wiz qt xv cods blink rpm why jug a fez qt xv gambled whiz junk cops fry qt xv brazens flip jug why dock m qt xv frenzy jambs howl puck dig qt xv hunks comb ply draw jig fez qt xv dawns chum flog jerk by zip qt xv folks whiz pend cram by jug qt xv pucks lard hymn bow fez jig qt xv block daring jumps fez why qt xv fazing clerk dumps why job qt xv dampen blocks jug whiz fry qt xv cashed junk flog rpm by wiz qt xv parked wiz ms bunch jog fly qt xv drops clef jab wiz gym hunk qt xv fled burn jack ms zip go why qt xv golfed cramps junk whiz by qt xv grazed flunk cs whip my job qt xv spanked whiz fry m jog club qt xv punks decry job flag m whiz qt xv blocs germ fad zip why junk qt xv cranked ms gulf zip job why qt xv jambs pend lock hug wiz fry qt xv chasm glib pod wry junk fez qt xv warns hold fez jug pick by m qt xv flunks gym carped whiz job qt xv brazen flick jugs pod m why qt xv gowns humbled jack zip fry qt xv flocked spy barn jug whiz m qt xv sunk wild parch fez jog m by qt xv drugs why fleck an zip m job qt xv zincs bug lymph dwarf joke qt xv fly bronzed whacks m jug pi qt xv don bumps chalk wry jig fez qt xv job whacks fen zip lug m dry qt xv gulped fry m snack whiz job qt xv drench saw folk jug zip by m qt xv barking clods jump why fez qt xv pricks glazed job fun m why qt xv dos whiz clap beg junk m fry qt xv wizards fleck hymn jug bop qt xv flesh crank pudgy job m wiz qt xv darn ply hicks womb jug fez qt xv land chimp brow jug fez sky qt xv wrongly hacks jump bid fez qt xv gulf nymphs racked job wiz qt xv sky lamp birch down jug fez qt xv hybrid jack plows fez m gnu qt xv jar zip fleck buns why dog m qt xv snugly whack drip m fez job qt xv lawn hick jogs dumb fez pry qt xv wrongly dish jack m pub fez qt xv wiz blonds peck ham jug fry qt xv molds whack zip fern jug by qt xv draws jig hock fez bun m ply qt xv blocks jump and fez why rig qt xv workbench jams fly zip dug qt xv block mind jaw spry fez hug qt xv dam fleck shy wrung job zip qt xv dingy hulks rpm fez caw job qt xv drags clef knob jumpy whiz qt xv whiz neck pro dabs fly m jug qt xv blazing how decks fry jump qt xv frenzy jump clog hawks bid qt xv fogs crazed blip m why junk qt xv chipmunks drawl jog by fez qt xv when zip log ducks jamb fry qt xv neck drags fly up whiz m job qt xv bucks flared nymph wiz jog qt xv bend flash rpm yuck wiz jog qt xv drench bulks jaw fog my zip qt xv branch dump low jig fez sky qt xv gumdrops why clink jab fez qt xv gyms brunched folk zip jaw qt xv men grasp duck job whiz fly qt xv calf shrew junk dog zip m by

    Sample Output:

    36523 anagrams and 9836 uniletter words 25 qt xv combed pranks flu why jig 25 qt xv hawks bog jumpy rind clef 25 qt xv duck wrens flap him jog by 25 qt xv gelds work jay bunch zip m 25 qt xv curbs flanked why m go zip 25 qt xv pens balmy frock whiz jug 25 qt xv chunks job pad grim we fly 25 qt xv creaks bond m whiz ply jug 25 qt xv doze brash puck wing m fly 25 qt xv jangled spry wiz bum hock 25 qt xv jumped flock rang whiz by 25 qt xv clasped frog hunk by m wiz 25 qt xv plumbed choking fry jaws 25 qt xv jibed chaps flunky m grow 25 qt xv wry hip flag junked cs mob 25 qt xv chinks daze plumb jog wry 25 qt xv forks paw lend chum jig by 25 qt xv dolphins jaw bug fez cry m 25 qt xv cashew bird flop gym junk 25 qt xv grands jump clef wiz by ho 25 qt xv blocks find gym her jaw up WINNER!! 26 qt xv daze nymph flow bricks jug 25 qt xv jumble frock spading why 25 qt xv franks behold gym cup wiz 25 qt xv refunds gym blow hack zip 25 qt xv cob flask dine rpm why jug 25 qt xv fazing cords kelp why bum 25 qt xv shingled fry puck jaw mob ^C BEST 26 qt xv daze nymph flow bricks jug qt jokes zinc flag rpm why bud xv length: 26 dups: () qt xv dumps chalk frenzy jig bow length: 26 dups: () qt xv grandly hump wicks job fez length: 26 dups: () qt xv bled frank cum why zip jogs length: 26 dups: () qt xv zoned brag whys flick jump length: 26 dups: () qt xv cranked jugs bop whiz m fly length: 26 dups: () qt xv spindly chow bark m jug fez length: 26 dups: () qt xv junk child prows bag my fez length: 26 dups: () qt xv furls peck dazing m why job length: 26 dups: () qt xv crumbly ponds hawk fez jig length: 26 dups: () qt xv lynched frog jump bask wiz length: 26 dups: () qt xv cods blink rpm why jug a fez length: 26 dups: () qt xv gambled whiz junk cops fry length: 26 dups: () qt xv brazens flip jug why dock m length: 26 dups: () qt xv frenzy jambs howl puck dig length: 26 dups: () qt xv hunks comb ply draw jig fez length: 26 dups: () qt xv dawns chum flog jerk by zip length: 26 dups: () qt xv folks whiz pend cram by jug length: 26 dups: () qt xv pucks lard hymn bow fez jig length: 26 dups: () qt xv block daring jumps fez why length: 26 dups: () qt xv fazing clerk dumps why job length: 26 dups: () qt xv dampen blocks jug whiz fry length: 26 dups: () qt xv cashed junk flog rpm by wiz length: 26 dups: () qt xv parked wiz ms bunch jog fly length: 26 dups: () qt xv drops clef jab wiz gym hunk length: 26 dups: () qt xv fled burn jack ms zip go why length: 26 dups: () qt xv golfed cramps junk whiz by length: 26 dups: () qt xv grazed flunk cs whip my job length: 26 dups: () qt xv spanked whiz fry m jog club length: 26 dups: () qt xv punks decry job flag m whiz length: 26 dups: () qt xv blocs germ fad zip why junk length: 26 dups: () qt xv cranked ms gulf zip job why length: 26 dups: () qt xv jambs pend lock hug wiz fry length: 26 dups: () qt xv chasm glib pod wry junk fez length: 26 dups: () qt xv warns hold fez jug pick by m length: 26 dups: () qt xv flunks gym carped whiz job length: 26 dups: () qt xv brazen flick jugs pod m why length: 26 dups: () qt xv gowns humbled jack zip fry length: 26 dups: () qt xv flocked spy barn jug whiz m length: 26 dups: () qt xv sunk wild parch fez jog m by length: 26 dups: () qt xv drugs why fleck an zip m job length: 26 dups: () qt xv zincs bug lymph dwarf joke length: 26 dups: () qt xv fly bronzed whacks m jug pi length: 26 dups: () qt xv don bumps chalk wry jig fez length: 26 dups: () qt xv job whacks fen zip lug m dry length: 26 dups: () qt xv gulped fry m snack whiz job length: 26 dups: () qt xv drench saw folk jug zip by m length: 26 dups: () qt xv barking clods jump why fez length: 26 dups: () qt xv pricks glazed job fun m why length: 26 dups: () qt xv dos whiz clap beg junk m fry length: 26 dups: () qt xv wizards fleck hymn jug bop length: 26 dups: () qt xv flesh crank pudgy job m wiz length: 26 dups: () qt xv darn ply hicks womb jug fez length: 26 dups: () qt xv land chimp brow jug fez sky length: 26 dups: () qt xv wrongly hacks jump bid fez length: 26 dups: () qt xv gulf nymphs racked job wiz length: 26 dups: () qt xv sky lamp birch down jug fez length: 26 dups: () qt xv hybrid jack plows fez m gnu length: 26 dups: () qt xv jar zip fleck buns why dog m length: 26 dups: () qt xv snugly whack drip m fez job length: 26 dups: () qt xv lawn hick jogs dumb fez pry length: 26 dups: () qt xv wrongly dish jack m pub fez length: 26 dups: () qt xv wiz blonds peck ham jug fry length: 26 dups: () qt xv molds whack zip fern jug by length: 26 dups: () qt xv draws jig hock fez bun m ply length: 26 dups: () qt xv blocks jump and fez why rig length: 26 dups: () qt xv workbench jams fly zip dug length: 26 dups: () qt xv block mind jaw spry fez hug length: 26 dups: () qt xv dam fleck shy wrung job zip length: 26 dups: () qt xv dingy hulks rpm fez caw job length: 26 dups: () qt xv drags clef knob jumpy whiz length: 26 dups: () qt xv whiz neck pro dabs fly m jug length: 26 dups: () qt xv blazing how decks fry jump length: 26 dups: () qt xv frenzy jump clog hawks bid length: 26 dups: () qt xv fogs crazed blip m why junk length: 26 dups: () qt xv chipmunks drawl jog by fez length: 26 dups: () qt xv when zip log ducks jamb fry length: 26 dups: () qt xv neck drags fly up whiz m job length: 26 dups: () qt xv bucks flared nymph wiz jog length: 26 dups: () qt xv bend flash rpm yuck wiz jog length: 26 dups: () qt xv drench bulks jaw fog my zip length: 26 dups: () qt xv branch dump low jig fez sky length: 26 dups: () qt xv gumdrops why clink jab fez length: 26 dups: () qt xv gyms brunched folk zip jaw length: 26 dups: () qt xv men grasp duck job whiz fly length: 26 dups: () qt xv calf shrew junk dog zip m by length: 26 dups: ()
Re: Pangram challenge: greed and Scrabble
by Limbic~Region (Chancellor) on May 12, 2022 at 18:45 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11143246]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-03-29 15:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found