Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to match max 1000 character string
our %REGEX= ( TEXT_1000=> qr/^[.\s]{0,1000}$/, )
obviously, it doesn't work. can anyone offer me insight? tossing . in the [] brackets seems to negate the expression. the \s is there so i get the newlines. btw, this does work:
qr/^.{0,1000}$/s,
my question is just why.

Replies are listed 'Best First'.
Re: stupid regex confusion
by ikegami (Patriarch) on Sep 25, 2006 at 21:48 UTC

    Regexp directives (like .) lose their special meaning inside character classes. You want
    qr/^.{0,1000}$/s
    or
    qr/(?s:^.{0,1000}$)/

    In character classes,

    • A leading ^ will negate the character class.
    • You can use Perl double-quote escapes (e.g. \x12, \cZ).
    • $ and @ start variable interpolation (e.g. $var, @var, $var[4]).
    • You can use Perl character classes (e.g. \d, \w, \s).
    • You can use POSIX character classes (e.g. [:digit:], [:word:], [:space:])
    • A dash can be used to create ranges (e.g. a-z).
    • A slash (\) will cause the next character to be interpreted literally.
    • Any other character (including ., * and +) will be matched literally.

    Update: Added list.

Re: stupid regex confusion
by davido (Cardinal) on Sep 26, 2006 at 04:07 UTC

    The regex question has been answered by ikegami and graff, but I just thought I'd toss this into the mix.

    If all you want to do is "match" (validate) that a string has between zero and 1000 characters, you could also do this:

    if( defined( $string ) and length( $string ) <= 1000 ) { print "It's a match!\n"; }

    Dave

Re: stupid regex confusion
by graff (Chancellor) on Sep 26, 2006 at 03:21 UTC
    ... btw, this does work:   qr/^.{0,1000}$/s my question is just why.

    Because with the "s" modifier at the end of the regex, the wildcard "." (not within square brackets) will match anything including newlines.

    As explained in the first reply, when you put "." inside square brackets as part of a character class, it loses its status as a wildcard, and will only match a literal period. If you wanted to insist on using a square-bracketed character class to match between 0 and 1000 characters, any of the following would work, but all of them would seem pretty goofy looking compared to the simpler version that simply uses "." as a wildcard with the "s" modifier.

    # alternatives to qr/^.{0,1000}$/s : qr/^[\s\S]{0,1000}/; qr/^[\d\D]{0,1000}/; qr/^[\w\W]{0,1000}/;