cazz has asked for the wisdom of the Perl Monks concerning the following question:

In an effort of making my regexps easier for my non-regexp speaking coworkers to read, I've been using POSIX character classes as much as possible.

Today, I ran into an instance that perlre(1) says should work but does not.

   You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension.

#!/usr/bin/perl my $a = "123{12\n"; print "ACK digit!\n" if ($a =~ /[:^digit:]/); print "ACK \\D!\n" if ($a =~ /\D/);
Both statements should bitch about having a non-digit character in $a. However, only the \D method bitches. Am I correct in thinking this is a bug?

Replies are listed 'Best First'.
Re: negating POSIX regexp classes doesn't work as expected
by Roy Johnson (Monsignor) on Apr 11, 2005 at 18:49 UTC
    1. use POSIX;
    2. Double your brackets. The outer set tells perl it's a character class; the inner set is part of the POSIX coding for using named charsets.

    Caution: Contents may have been coded under pressure.

      I don't think one needs use POSIX, at least with the more recent versions of perl; e.g. with v5.8.4:

      #!perl -l use strict; use warnings; my $s = 'a1b2c3'; print for $s =~ /[[:alpha:]]/g; print for $s =~ /[[:digit:]]/g; __END__ a b c 1 2 3

      the lowliest monk

        You are correct, sir. I had thought it was use POSIX that caused it to bark at me when I didn't have the doubled brackets (POSIX syntax [: :] belongs inside character classes in regex), but it seems that it will bark without it. In either case, it only barks for positive char classes like [:digit:], not for negated ones like [:^digit:].

        Caution: Contents may have been coded under pressure.
Re: negating POSIX regexp classes doesn't work as expected
by tlm (Prior) on Apr 11, 2005 at 18:53 UTC

    Character classes should be used like this: [xyz[:class:]XYZ]; the [] are part of the character class.

    This is what you want:

    #!/usr/bin/perl my $a = "123{12\n"; print "ACK non-digit!\n" if ($a =~ /[[:^digit:]]/); print "ACK \\D!\n" if ($a =~ /\D/);

    the lowliest monk

Re: negating POSIX regexp classes doesn't work as expected
by ikegami (Patriarch) on Apr 11, 2005 at 19:09 UTC
    Keep in mind that [\d] and [[:digit:]] can match things other than 0-9. They match the Chinese character for 1, for example. You could just use [^0-9] to mean "match a character that's not 0, 1, ... or 9".
Re: negating POSIX regexp classes doesn't work as expected
by ww (Archbishop) on Apr 11, 2005 at 18:52 UTC
    updated to correct language testing a few (simple - no refactoring or additional conditions, etc) additions to your code:
    #!/usr/bin/perl my $a = "123{12\n"; print "ACK digit! per cazz posix\n" if ($a =~ /[:^digit:]/); print "\npast cazz posix\n"; print "ACK digit! per ww-digit-posix\n" if ($a =~ /[^:digit:]/); print "\tpast ww posix\n"; print "ACK digit! per ww-ISdigit-posix\n" if ($a =~ /[^:isdigit:]/); print "\tpast ww ISdigit-posix\n"; print "ACK ISdigit! per cazz-ISdigit-posix\n" if ($a =~ /[:^isdigit:]/ +); print "\npast cazz ISdigit posix\n"; print "ACK \\D!\n" if ($a =~ /\D/);
    Output from CL execution of perl cazz.pl

    past cazz posix
    ACK digit! per ww-digit-posix
    past ww posix
    ACK digit! per ww-ISdigit-posix
    past ww ISdigit-posix

    past cazz ISdigit posix
    ACK \D!

    But:
    print "ACK digit! per ww-ISdigit-posix\n" if ($a =~ /[[^:isdigit:]]/); + #changed; dbl brkts
    changes output:
    past cazz posix
    ACK digit! per ww-digit-posix
    past ww posix
    past ww ISdigit-posix <!----

    past cazz ISdigit posix
    ACK \D!
Re: negating POSIX regexp classes doesn't work as expected
by chromatic (Archbishop) on Apr 11, 2005 at 18:45 UTC
    Am I correct in thinking this is a bug?

    Yes, but unfortunately it's yours:

    my $a = "123{12\n"; print "ACK digit!\n" if ($a =~ /[^:digit:]/); print "ACK \\D!\n" if ($a =~ /\D/);

    The POSIX class names include the colons.

    Update: I wrote a bad test case. Sorry for the noise.

      Wrong two ways. The caret should be after the colon (Update: or just inside the outer brackets, in which case it's normal charset negation and not a POSIX extension), and the brackets need to be doubled. Did you test this? Did you try it with and without nondigits in the string?

      Caution: Contents may have been coded under pressure.