in reply to Re: regular expressions
in thread regular expressions

I do not think that a negated character class is a good idea for looking for groups of consonants, because, for example, it will pick groups of digits, as shown below under the Perl debugger:
DB<1> $_ ="123 annn jkjkkj bcdefgh 2015 "; DB<2> push @words, grep { /[^aeiouy]{4}/i } split; DB<3> x \@words; 0 ARRAY(0x600500b18) 0 'jkjkkj' 1 2015 DB<4>

Replies are listed 'Best First'.
Re^3: regular expressions
by AnomalousMonk (Archbishop) on Jun 07, 2015 at 18:02 UTC

    I agree that doubly-negated character classes can be very tricky, but with care, they can be managed to good effect.

    I think of it this way: Start with  [^\W] which is the same as  [\w] (or just \w). As you point out, this includes digits and _ (underscore) as well as alphas. "Subtract", as it were, the digits with  [^\W\d] and underscore with  [^\W\d_] and you're left with all alpha characters. Then subtract your chosen vowels  [^\W\d_aeiouyAEIUOY] and you're done!

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '123 annn xyzzy wwwewww xxx9xxx vvv_vvv eieio p pp ppp 2015 v +wxz vwxzpdq'; ;; my $consonant = qr{ [^\W\d_aeiouyAEIUOY] }xms; ;; printf qq{'$_' } for $s =~ m{ $consonant{4,} }xmsg; " 'vwxz' 'vwxzpdq'

    All this is easier to manage, IMHO, with POSIX character classes or Unicode properties (if you're brave enough to venture out onto the thin, slippery ice of Unicode); both the following definitions work the same in the code above:
        my $consonant = qr{ [^[:^alpha:]aeiouyAEIUOY] }xms;
        my $consonant = qr{ [^\P{PosixAlpha}aeiouyAEIUOY] }xms;
    YMMV. See perlrecharclass, perluniprops.

    (See also the experimental Extended Bracketed Character Classes of version 5.18+; I can't give any examples using these ATM.)


    Give a man a fish:  <%-(-(-(-<

      I agree with you, doubly-negated character classes can be tricky but can also be very useful. I was really reacting to the patterns proposed by Anonymous Monk and by toolic which were just not quite right.

        If a digit is not a vowel - is it a consonant? Unclear spec - hehehe. Easy fix:

        [^\WaeiouyAEIOUY0-9_]

        It's a valid solution if it passes the test cases. What? There were no test cases? Nevermind :)