gje21c has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
Immediate confession ... this is a Scrabble-related question of low importance ! And it's a pure regex question, but I've found the Perl Monks are the best for these things, just as Perl regex is certainly the best, I always use it, even from the command line.
I'd like to search a list of words (a dictionary list) to find those I can make with my letters. So, if I have AABCDEF I'd like to find all words with 1 to 7 of those letters in. I need a regexp like AABCDEF{1,7} but which only allows each letter to occur once, to mimic the fact that you have only one physical tablet per letter (stating the obvious here but, the plainer the context the better).
I hope this isn't obvious. I can't think of an option that does it, and intuitively that once-only using up of tokens is a counting thing quite alien to regex's pattern matching. But you never know.
ATdhvaannkcse, Greg E
Re: Regex question once-only use of chars in a charset
by CountZero (Bishop) on May 15, 2011 at 13:19 UTC
|
I think it can be as simple as this: use Modern::Perl;
open my $WORDLIST, '<', './wordlist.txt' or die $!;
my $available = 'AABCDEF';
$available = join '?', sort split '', $available;
$available .= '?';
while (<$WORDLIST>) {
chomp;
my $sorted = join '', sort split '';
say if $sorted =~ /^$available$/io;
}
Running this script with your 'AABCDEF' gives me the following results:abe
ace
aced
baa
bad
bade
be
bead
bed
cab
cad
cade
cafe
dab
dace
deaf
deb
decaf
fab
facade
face
faced
fad
fade
fed
I use a 58,000 elements wordlist and it needed less than a few seconds to generate this result.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Cool! Sorting the word as well is a game changer! Then you just check off the letters in order.
| [reply] [Watch: Dir/Any] |
|
It is an old trick. Transform both sides of the comparison to a canonical form and the whole problems becomes much easier to solve.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [Watch: Dir/Any] |
|
What's the o in say if $sorted =~ /^$available$/io;?
I know i is for case-insensitive matching, but I can't find an o modifier in the documentation.
| [reply] [Watch: Dir/Any] [d/l] |
|
>perl -wMstrict -le
"my $s = '1a2b3c';
;;
print qq{no /o};
for my $i (qw(3 2 1)) {
print qq{matched '$1'} if $s =~ m{ ($i.) }xms;
}
;;
print qq{with /o};
for my $i (qw(3 2 1)) {
print qq{matched '$1'} if $s =~ m{ ($i.) }xmso;
}
"
no /o
matched '3c'
matched '2b'
matched '1a'
with /o
matched '3c'
matched '3c'
matched '3c'
The function of the /o modifier has been generally replaced by the qr// regex object builder (see in perlop).
I was a bit surprised not to see anything about /o in perlre, but it is (briefly and obliquely) discussed in qr/STRING/msixpodual (5.14), and the following remains in perlretut (at least through 5.12):
Optimizing pattern evaluation
We pointed out earlier that variables in regexps are substituted before
the regexp is evaluated:
$pattern = 'Seuss';
while (<>) {
print if /$pattern/;
}
This will print any lines containing the word "Seuss". It is not as
efficient as it could be, however, because Perl has to re-evaluate (or
compile) $pattern each time through the loop. If $pattern won't be
changing over the lifetime of the script, we can add the "//o" modifier,
which directs Perl to only perform variable substitutions once:
#!/usr/bin/perl
# Improved simple_grep
$regexp = shift;
while (<>) {
print if /$regexp/o; # a good deal faster
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regex question once-only use of chars in a charset
by GrandFather (Saint) on May 15, 2011 at 07:04 UTC
|
I doubt this fits what you have described, but it may be of interest:
#!/usr/local/bin/perl
use strict;
use warnings;
my @words = split /\s+/, <<WORDS;
a alien all allows always an and are as best but can certainly command
confession context counting dictionary does each even fact find for fo
+und from
have here hope i if immediate importance intuitively is just letter le
+tters like
list low make mimic monks my need never obvious occur of one only opti
+on pattern
per perl physical plainer pure question quite regex regexp search tabl
+et that
the these thing think this those to tokens up use using which with wor
+ds you
WORDS
my %normLu;
push @{$normLu{normWord ($_)}}, $_ for @words;
my $match = buildRegex ('adeilln');
my @matches = map {@{$normLu{$_}}} grep {/^$match$/} keys %normLu;
print "@matches\n";
sub normWord {
return join '', sort split '', $_[0];
}
sub buildRegex {
my ($word) = @_;
my %freq;
++$freq{$_} for split '', $word;
return join '', map {"$_\{0,$freq{$_}\}"} sort keys %freq;
}
Prints:
alien all and an a i
True laziness is hard work
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regex question once-only use of chars in a charset
by Anonymous Monk on May 15, 2011 at 06:45 UTC
|
Use the various search options ( http://search.cpan.org,...) and find code to generate permutation as you want, and generate a list of words (I believe there is an example in Higher Order Perl).Then give this list to Regexp::Trie or Regexp::Assemble.
It might be a roundabout way of doing it, and it might run a bit slow to generate the regex pattern, but it won't require learning anything :O and the regex will be optimized :D | [reply] [Watch: Dir/Any] |
Re: Regex question once-only use of chars in a charset
by JavaFan (Canon) on May 15, 2011 at 21:42 UTC
|
Eh, why a regexp? This is what I use:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use Getopt::Long;
use List::Util 'sum';
my %tiles = qw [
A 1 B 3 C 3 D 2 E 1 F 4 G 2 H 4 I 1
J 8 K 5 L 1 M 3 N 1 O 1 P 3 Q 10 R 1
S 1 T 1 U 1 V 4 W 4 X 8 Y 4 Z 10
];
die "No rack?\n" unless @ARGV;
my ($rack, $blanks) = (@ARGV, 0);
my $full = length($rack) + $blanks == 7;
my %target;
$target{$_}++ for split //, $rack;
my @words = `cat /usr/share/dict/words`;
chomp @words;
my @good;
my %discount;
WORD: foreach my $word (@words) {
next unless $word =~ /^[a-z]+$/;
my %copy = %target;
my $b = 0;
my $d = 0;
foreach my $c (split //, $word) {
if (--$copy{$c} < 0) {
$b++;
$d += $tiles{uc $c};
$discount{$word} = $d;
}
next WORD if $b > $blanks;
}
push @good, $word;
}
my $tl = length($rack) + $blanks;
@good = sort {$a->[1] <=> $b->[1]}
map {[$_, sum (map {$tiles{uc $_}} split //) -
($discount{$_} || 0) +
($full && length($_) == $tl ? 50 : 0)]} @good;
printf "%2d: %s\n", $_->[1], $_->[0] for @good;
__END__
Usage: ./program rack [nr-of-blanks], were rack are the non-blank tiles, and there's an optional number of blanks. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regex question once-only use of chars in a charset
by John M. Dlugosz (Monsignor) on May 15, 2011 at 10:32 UTC
|
I think you can start with a simple regex that pulls those words from the main dictionary that contain only the allowed letters and are not too long. Make the regex a subset of the full test. But that will cull the list of a quarter million words down to a much smaller number, in an in-memory @list.
Then apply the full analysis to only the entries on that list. You might employ subsequent passes with patters generated for the specific case: if you have 1 letter E, look for anything containing two E's and scratch them off, etc. You could generate such a case for each unique letter and then join them together with '|' to make a rejection test. Final pattern would be /e.*e|f.*f.*f|…/ to tell you if the candidate word has more than 1 E, or more than 2 Fs, etc.
So you see the test for more than N of the same letter in your rack is N+1 copies separated by ".*".
| [reply] [Watch: Dir/Any] |
Re: Regex question once-only use of chars in a charset
by gje21c (Acolyte) on May 20, 2011 at 22:23 UTC
|
Thanks everyone.
Those approaches will be very useful. I could have made it clearer I guess, that I can program my way to a solution, but was wondering if regex would have it built-in, so I could just use "cat dictionary-file | grep /Z1[AABCDEF]/Z2where Z1 or Z2 is a magic modifer that forces each letter in the [ ]set to be used only once. It looks like not. But you never know, there's a world of functionality in regexp and anything that feels like it could be in there, or would have been asked before, usually is in there.
Well back to WordsWithFriends on iPhone ! The dictionary for that game has 173000 words and a vast number I have never heard of, not to mention English/US/other spellings etc. My friends just experiment with words until they find something it accepts and I need to fight back ! Not really fair but I need a few wins.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|