Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, monks. I'm trying to figure out if my regex is too redundant and if there are any ways to optimize it.
while (<>){ # grab: ween{(commands)} if (/^ween\s*\{{1}\s*(.*\;){1}\s*\}{1}$/){ $cmd = $1; # avoid: .!+=[]*<>$%@?&:/\{} command in syntax if ( $cmd=~/[.\!\+=\[\]\*<>\$\%\@\?\&:\/\\\{\}]/g){ print "$0:Found illegal character!\n"; } else { print "$cmd\n"; } } else { print "$0:Incorrect Syntax!!\n"; } }

Replies are listed 'Best First'.
Re: regex optimization
by Hofmator (Curate) on Aug 08, 2001 at 18:56 UTC

    Yup, there are ways to improve:

    • Leave out the {1}, this is totally redundant, everything is matched exactly once unless you specify otherwise. This leaves us with: /^ween\s*\{\s*(.*\;)\s*\}$/
    • You don't have to escape { unless it is ambigous. And no need to escape ; so: /^ween\s*{\s*(.*;)\s*}$/
    • Metacharacters loose their special meaning in a character class, so no need to escape them there. With the exception of the closing square bracket and the forward slash (used as delimiter):
      $cmd =~ /[.!+=[\]*<>$%@?&:\/\\{}]/g # also possible but I prefer the alternative above $cmd =~ /[].!+=[*<>$%@?&:\/\\{}]/g
    • You can use different delimiters (then it's no longer necessary to escape the forward slash) but then can no longer leave out the m// of the pattern match operator. The /g modifier is useless, too, it's enough to know if the pattern matched or not: $cmd =~ m#[.!+=[\]*<>$%@?&:/\\{}]#

    -- Hofmator

Re: regex optimization
by Anonymous Monk on Aug 08, 2001 at 18:54 UTC
    You have a few {1}'s in your first regex which of course do nothing since matching 1 of something is the default. Since the single .* is the only part of the regex that can match one of the illegal characters you can reduce it to a single regex by replacing .* with [^.\!\+=\[\]\*<>\$\%\@\?\&:\/\\\{\}]*. Also the /g modifier on the second (and now redundant) regex isn't doing anything useful that I can see.
Re: regex optimization
by tachyon (Chancellor) on Aug 08, 2001 at 19:06 UTC

    A few comments. First {1} is useless as a literal char in a regex means {1} of. Next adding \s* at the end allows trailing whitepspace which is common. Next - it is better to specify what is legal than what is illegal as your regex will probably not catch all the nasty chars. Just allow through what you need by adding them to the $valid class. Using this method you know all the chars that are valid. If you drop the syntax check you can do the match/untaint in one line. Oh and you don't need to \ the ; as the 12 regex metachars are $^*()+{[|\.? Finally the /g does nothing as you are insisting on a match from the begining of the line to a newline (which in this case will be the end of the line - unless you have been messing with $/ :-)

    my $valid = qr/[\w\s]/; while (<DATA>){ # grab: ween{(commands)} if (/^ween\s*\{\s*($valid+;)\s*\}\s*$/) { my $cmd = $1; print "Found $cmd\n"; } else { print "$0: Illegal chars or incorrect syntax!\n"; print "Line: $_\n"; } } __DATA__ ween { foo bar; } ween { del *.*; }

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print