in reply to Analyzing regular expression performance

I don't have an answer to your question directly, but posting some examples of these "hairy regexes" (great mental picture) might help with other replies here.

---
It's all fine and dandy until someone has to look at the code.
  • Comment on Re: Analyzing regular expression performance

Replies are listed 'Best First'.
Re^2: Analyzing regular expression performance
by Sprad (Hermit) on May 12, 2006 at 16:10 UTC
    Here's the one that inspired me to ask:

    $sql = " create table #foo (variable1 char(20), variable2 char(30) null) /* comments make it slow */ "; my $comment = qr/(?:\/\*.*?\*\/)/; # Simple +match of C-style comment my $sp = qr/(?:\s*$comment?\s*)*/; # This ju +st looks for /* comments */, with whitespace on either side. # Sinc +e comments can be anywhere, use this anywhere whitespace is allowed. # The parens don't capture +, so it won't affect your $1, $2 counts. my $header = qr/create${sp}table${sp}\#?\w+${sp}\(${sp}/i; # "create + table foo (", where table name may have a # mark my $name = qr/[\#\w]+${sp}/; # Variabl +e name, can contain # mark my $type = qr/\w+${sp}/; # Variabl +e type (e.g. 'char', 'number') my $op_char = qr/(?:\([\d\s,]+\)${sp})?/; # Bounds +for array types, can be multidimensional (e.g. '(10, 20)', '(30)') my $words = qr/(?:[\w]+(?:\([^\)]*\))?[${sp})]*)*/; # Text af +ter variable type -- may be 'null', 'not null', a 'check' block, etc. my $subst = qr/${sp}\{[^\}]+\}${sp}/; # Substit +ution placeholder -- some text in {curlies} my $decl = qr/($name $type $op_char $words)/x; # A compl +ete variable declaration my $comma = qr/,${sp}/; # Literal + comma, with possible trailing space my $rest = qr/\)${sp}/; # Closing + the block with a close-paren if ($sql =~ / ($header) ((?: $decl $comma | $subst $comma | $subst)* (?: $decl | $subst)) ($rest) /x) { print "Match!\n"; print "1: >>>$1<<<\n"; print "2: >>>$2<<<\n"; }
    It takes about 45 seconds on my box. The longer the comment, the longer the runtime.

    ---
    A fair fight is a sign of poor planning.