ColonelPanic has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I posted a question earlier about this, but now I have narrowed in on the cause of the problem. The code below is a simplified parser of C conditionals. Simply enter a conditional, such as foo && (bar || baz) on the command line, and it will create a tree-like structure. Well, for a simple expression (no parentheses), this works fine. For a complicated expression that causes treeify() to recurse, it still works, but the interpreter crashes afterward, apparently during garbage collection.

Can anyone see what I have done wrong? I have no clue what the problem is. I appreciate all help greatly.

I am running ActiveState v.5.8.3 on Win32.

Thanks, ColonelPanic
#!/usr/bin/perl -w use strict; { my $exp; chomp ($exp = <STDIN>); init_regexes(); my $foo = treeify($exp); use Data::Dumper; print Dumper($foo); } my $TERM = ""; my $FUNCTION_CALL = ""; my $ATOM = ""; my $PAREN_EXP = ""; my $CONDITIONAL = ""; my $PAREN_COND = ""; my $OPERATOR = ""; my %OPS = ( '&' => 1, '|' => 2, '&&' => 3, '||' => 4, ); #regexes to parse C conditionals (more or less). sub init_regexes { $TERM = qr/ (?: (?: (??{$PAREN_COND}) ) | (?: (??{$FUNCTION_CALL}) ) | (?: (??{$ATOM}) ) ) /x; $PAREN_EXP = qr/ \( \s* (?: (?> [^()&|]+ ) | (??{$PAREN_EXP}) ) \s* \) /x; $ATOM = qr/ (?: (?: (?: [!*&] | \b ) \w+\b ) | (??{$PAREN_EXP}) ) /x; $FUNCTION_CALL = qr/ \b\w+\s? \( \s*(??{$TERM})\s* (?: ,\s*(??{$TERM})\s* )* \) /x; $OPERATOR = qr/ (?: && | \|\| | & | \| | \^ ) /x; $CONDITIONAL = qr/ ((??{$TERM})) (?: \s* ( (??{$OPERATOR}) ) \s* (?: ((??{$TERM})) ) )+ /x; $PAREN_COND = qr/ !? \( (?: (??{$CONDITIONAL}) ) \) /x; } #make a tree out of a C expression. sub treeify { my $exp = shift; my $tree; #find the first term in the string; put in $1 and delete. while ($exp =~ s/^($TERM)//) { my $term = $1; my $type = 'atom'; if ($term =~ /\s*($CONDITIONAL)\s*/) { $term = $1; $type = 'cond'; } my $new_term = { 'type' => $type, #if the term itself should be a node, process it. 'value' => $term, 'ref' => undef }; if ($type eq 'cond') { $new_term->{'ref'} = treeify($term); } push @{ $tree->{'terms'} }, $new_term; #now find, put in $1, and delete the next operator. if ($exp ne '' and $exp =~ s/^\s*($OPERATOR)\s*//) { $tree->{'op'} ||= $1; } else { return $tree; } } }


When's the last time you used duct tape on a duct? --Larry Wall

Replies are listed 'Best First'.
Re: Recursion Woes
by itub (Priest) on May 11, 2004 at 22:02 UTC

    There have been times where I thought of interesting solutions to a problem by using executable code within regular expressions. The problem is that as soon as I start doing something "real" I find mysterious crashes, which I've never been able to reproduce with simple test cases. My conclusion was that, for the time being, it's better to stay away from fancy experimental features in regexes and just write more code (such as a state machine) outside the regex.

    As an alternative, you can consider using a parser generator such as Parse::RecDescent or Parse::Yapp.

Re: Recursion Woes
by hv (Prior) on May 11, 2004 at 21:52 UTC

    Can you give a specific test-case that causes it to crash? I tried a couple of examples with 5.8.3 under linux and didn't get a problem.

    In general, lexical variables tend to cause problems within deferred evals: I suspect that by the time the evals get compiled perl has already decided that the file-scope lexicals are not referred to, and thrown them away.

    So, once you have a test-case, I'd suggest replacing 'my' with 'our' for all those file-scope variables (from $TERM to %OPS), and see if the test-case then passes.

    Hugo

      Thanks for your suggestions. See the post above for an example that causes a crash. I did try using our, but it didn't make a difference. The lexicals are not getting thrown away, because the program produces the correct output. It is just a matter of the interpreter crashing after the script is finished, probably during garbage collection.


      When's the last time you used duct tape on a duct? --Larry Wall
Re: Recursion Woes
by Plankton (Vicar) on May 11, 2004 at 21:40 UTC
Re: Recursion Woes
by dave_the_m (Monsignor) on May 11, 2004 at 21:48 UTC
    Can you supply some specific input that it crashes with; I'm always interested fixing coredump bugs.

    Dave.

      Any expression that causes recursion will crash it. For example:
      foo && (bar || baz)


      When's the last time you used duct tape on a duct? --Larry Wall
        Unfortunately I can't reproduce it. Ah well, never mind.