in reply to regex problem

Well, a general principle is this:
# $re = exclude(@words); sub exclude { my %words; push @{ $words{ quotemeta substr($_, 0, 1) } }, quotemeta substr($_, 1) for @_; my $first = "[^@{[ join '', keys %words ]}]*"; my $rest = join "|", map "$_(?!" . join("|", @{ $words{$_} }) . ")", keys %words; return qr/^$first(?:(?:$rest)$first)*$/; } my $re = exclude(qw( this that those )); # print $re; # for debugging purposes for ("I like this", "give me that one", "these rock!") { print "$_ => " . /$re/; }

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Word Exclusion Regex (was Re: regex problem)
by blakem (Monsignor) on Feb 10, 2002 at 10:27 UTC
    As impressive as this is (and I haven't got it entirely figured out yet) there are a couple bugs. $first contain extra spaces when the group includes words that start with different letters... localize $" or just do a boring join to fix that. Also, words with multiple occurances of the first letter ('aabc' instead of 'abc') get excluded even when they shouldn't.

    The following output shows several incorrect cases using an exclude list of qw(dog cat pig):

    (?-xism:^[^p c d]*(?:(?:p(?!ig)|c(?!at)|d(?!og))[^p c d]*)*$) dog => cat => pig => owl => 1 ddog => ccat => ppig => pdog => pcat => elephant => ppppcatgggg =>

    -Blake

      Oops, the original version used join() when creating $first. I don't know why I changed it. As for the other complaint, the regex is designed to ensure the words don't appear at all. If you only wanted a regex that didn't match a string that is a set of words, it would look much simpler: /^(?!(?:cat|dog|pig)$)/. That's not what I was going for.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        whoops... my bad. must have munged the regex myself somehow...

        Pardon my conceit, as I don't mean to contradict this captivating regex, but I purport that its still not correct.... all of which happen to get incorrectly excluded for exclude('dog','cat','pig'): ;-P

        (?-xism:^[^pcd]*(?:(?:p(?!ig)|c(?!at)|d(?!og)))*[^pcd]*$) dog => cat => pig => owl => 1 conceit => contradict => captivating => purport => correct =>

        -Blake
        p.s. List obtained using:

        $ perl -lne 'print if /^[dpc].*[dpc]/ && !/dog|cat|pig/' /usr/dict/wor +ds
Re: Word Exclusion Regex (was Re: regex problem)
by Juerd (Abbot) on Feb 09, 2002 at 21:57 UTC