http://qs1969.pair.com?node_id=11136479


in reply to Re^2: Problem with regex wildcard operator (.)...?
in thread Problem with regex wildcard operator (.)...?

OK, I shortened it, hope this is more acceptable. In the process I discovered, as I suspected, that it has something to do with the regex order, but exactly what is still pretty opaque to me.

#!/usr/bin/perl my @words = ('aa', 'as', 'es', 'is', 'os', 'sh', 'si', 'so'); print "WORDS: @words\n"; print "----------------------------\n"; my $regex = ".?a?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n"; my $regex = "a?.?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n"; $regex = "s?.?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n"; $regex = ".?s?"; print "Matches for regex pattern : $regex\n\n"; foreach (@words) { if(join('', sort split //, $_) =~ /^$regex$/) { print "$_\n"; } } print "----------------------------\n";

OUTPUT

WORDS: aa as es is os sh si so ---------------------------- Matches for regex pattern : .?a? aa ---------------------------- Matches for regex pattern : a?.? aa as ---------------------------- Matches for regex pattern : s?.? ---------------------------- Matches for regex pattern : .?s? as es is os sh si so ----------------------------

Regards.

Replies are listed 'Best First'.
Re^4: Problem with regex wildcard operator (.)...?
by ikegami (Patriarch) on Sep 05, 2021 at 21:28 UTC

    This is what you should get:

    words: aa as es is os sh si so string: aa as es is os hs is os ^.?a?$: y n n n n n n n aa ^a?.?$: y y n n n n n n aa as ^s?.?$: n n n n n n n n ^.?s?$: n y y y y y y y as es is os sh si so

    That's exactly what you get. What did you expect to be different?

    Seeking work! You can reach me at ikegami@adaelis.com

Re^4: Problem with regex wildcard operator (.)...?
by LanX (Saint) on Sep 05, 2021 at 21:23 UTC
    You should also tell us what you expected and where it differs ..

    I think you are trying to resort the characters in the input-strings alphabetically (why?) maybe print those temp strings too for debugging purpose.

    I'm no Scrabble player and I'm not sure why you do what you do.

    What's the concept here?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re^4: Problem with regex wildcard operator (.)...?
by AnomalousMonk (Archbishop) on Sep 06, 2021 at 03:04 UTC
    ... it has something to do with the regex order ...

    The regexes actually being used are all anchored with ^ $ assertions (see perlre, perlretut). That means that certain characters have to appear at the beginning or end of certain strings for a match to occur.

    I'm also unsure of just what you expect to get, but consider this code:

    Win8 Strawberry 5.8.9.5 (32) Sun 09/05/2021 22:17:31 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings use Data::Dump qw(dd); my @words = ( # added a few extra 'words' '', 'x', 'a', 's', 'aa', 'as', 'es', 'is', 'os', 'sh', 'si', 'so' ); dd 'WORDS:', \@words; print "----------------------------\n"; for my $regex (".?a?", "a?.?", "s?.?", ".?s?",) { print "Matches for ACTUAL regex pattern : ^$regex\$\n\n"; foreach my $word (@words) { my $sorted = join '', sort split //, $word; if ($sorted =~ /^$regex$/) { print "'$sorted' -> '$word' "; } } print "\n----------------------------\n"; } ^Z ( "WORDS:", ["", "x", "a", "s", "aa", "as", "es", "is", "os", "sh", "si", "so"], ) ---------------------------- Matches for ACTUAL regex pattern : ^.?a?$ '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'aa' -> 'aa' ---------------------------- Matches for ACTUAL regex pattern : ^a?.?$ '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'aa' -> 'aa' 'as' -> 'a +s' ---------------------------- Matches for ACTUAL regex pattern : ^s?.?$ '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' ---------------------------- Matches for ACTUAL regex pattern : ^.?s?$ '' -> '' 'x' -> 'x' 'a' -> 'a' 's' -> 's' 'as' -> 'as' 'es' -> 'e +s' 'is' -> 'is' 'os' -> 'os' 'hs' -> 'sh' 'is' -> 'si' 'os' -> ' +so' ----------------------------
    Note that I've added '' (empty string) 'x' 'a' 's' to the @words array. Note also that all the regexes in question match all these new strings.

    The complete regex ^.?a?$ matches any zero- or one-character string. The regex can match some two-character strings. For such a match, an 'a' must be at the absolute end of the string (or before a newline at the end of the string). There's only one two-character string in @words that, after the string is sorted, matches this regex: 'aa'. (Update: Neither this regex nor any of the others discussed can match a string of more than two characters.)

    The regex ^a?.?$ matches any zero- or one-character string and some two-character strings. For a two-character match, an 'a' must be at the absolute beginning of the string. There are two, two-character strings in @words that, after being sorted, match this regex: 'aa' 'as'.

    The regex ^s?.?$ doesn't match any two-character strings because no string in @words, after being sorted, begins with an 's'.

    The regex ^.?s?$ matches almost every two-character string in @words because almost every such string, after being sorted, ends in 's'.


    Give a man a fish:  <%-{-{-{-<

Re^4: Problem with regex wildcard operator (.)...?
by Marshall (Canon) on Sep 06, 2021 at 04:55 UTC
    I am still not "getting it" in terms of your input syntax and how it should match against the dictionary.

    I recently wrote a "cheater program" for a less sophisticated game, but perhaps with some similar matching requirements?

    In my game, I have a set of "tiles", letters than I can use to form words. I cannot use any letter other than one of these letters to form words. There are not any 2 letter words in my dictionary. Your dictionary has a lot of weird words, but I guess that is how scrabble is!

    Your example is a bit hard for me to figure out since, es,os,sh,si are not "normal English words" although as,is,so are. Likewise "aa" is not a normal English word, although "as" is. There are words in the scrabble dictionary that I've never read, heard or used.

    What I'd like to see is a user session that shows how your scrabble cheat game works. I will attach a user example from my cheat program for another game. Once we understand what the code is supposed to do, then at least I would be interested in helping you create a "blackbelt" level cheat program - just for the fun of it!

    Here is an example of the kind of example that I want from you:

    Example 1: leading ";" means this is a list of letters instead of a pattern list of letters or pattern: ;lolewf list of letters or pattern: --- The --- says: show me all 3 letter words that can be formed from my letter "universe". elf ell few foe fol low owe owl woe # Example 2: # Show me all 4 letter words which can be formed from # the "list of letters" that have "l" as the 3rd letter list of letters or pattern: ;odlswe # List of Letters list of letters or pattern: --l- # A search pattern dole owls sold sole weld wold
    I don't know really anything about Scrabble. But I think you will have to develop a syntax to input words that are on the board already. I think if some guy plays "pagan", you can play "ize" to turn this into "paganize" and score big points. In addition, each letter has some kind of "value", playing a "z" is worth more than playing an "a". I suspect that you are going to want a word sort order that instead of being alphabetical, is based upon the point value of the word?

    I think your tr statement could be better:
    tr is a very stupid thing and it does not use character sets. I am not sure why your code seems to work.

    use strict; use warnings; my $string = '\abCD'; $string =~ tr/\[A-Z]/[a-z]/; #should be tr/A-Z/a-z/? print "$string\n";
      I am not sure why your code seems to work.

      It works because \[ is just an escaped [, so tr/\[A-Z]/[a-z]/ is translating [ to [, ] to ], and uppercase to lowercase. Any other character, e.g., \, is unaffected.


      Give a man a fish:  <%-{-{-{-<

        Thanks for the explanation! I did not think that an escape '\' would work within TR. Even though that does indeed work, I prefer my tr/A-Z/a-z/ because it is easy for me to understand it at a glance. Of course, mileage varies.
Re^4: Problem with regex wildcard operator (.)...?
by sbrothy (Acolyte) on Sep 05, 2021 at 22:13 UTC

    I'm no longer sure. I've had a beer by now so I'll look at it tomorrow. Thank you for your time nonetheless. I'll reread the perldocs on regexes before I bother you again. Regards.