hanenkamp has asked for the wisdom of the Perl Monks concerning the following question:

I've written a simple grammar for parsing SQL-like filter expressions. This is for the Persist library and for the Persist::Filter module in particular. (This is a library I wrote that bears some resemblance to Alzabo and an older Persistent library.) (Also, please excuse my verbosity; this problem has been somewhat difficult to explain concisely.)

Anyway, the library itself seems fine, but when I'm using the library in another project, I am getting some odd results. The grammar--in the latest version I'm working on--looks like this:

filter: expression end_of_file { $return = $item[1] } | { $Persist::Filter::errors .= "ERROR: Could not parse +filter.\n"; foreach (@{$thisparser->{errors}}) { $Persist::Filter::errors .= "ERROR Line $_->[1]: $ +_->[0]\n"; } $thisparser->{errors} = undef; } end_of_file: /^\Z/ expression: comparison logical_operator <commit> expression { $return = new Persist::Filter::Junction(@item{qw(compari +son logical_operator expression)}) } | comparison | /not/i <commit> expression { $return = new Persist::Filter::Not(lc($item[1]), $item{e +xpression}) } | <error?> <reject> comparison: first_operand <commit> comparison_operator second_opera +nd { $return = new Persist::Filter::Comparison(@item{qw(first +_operand comparison_operator second_operand)}) } | '(' <commit> expression ')' { $return = $item{expression} + } | <error?> <reject> first_operand: operand second_operand: operand operand: identifier | literal | placeholder logical_operator: /and/i { $return = lc($item[1]) } | /or/i { $return = lc($item[1]) } comparison_operator: '=' | '<>' | /<=?/ | />=?/ | /(?:not\s+)?i?like/i { $item[1] =~ s/\s+/ /; $return = lc($item[1]) } identifier: table_name '.' <commit> column_name { $return = new Persist::Filter::Identifier("$item{table_n +ame}.$item{column_name}") } | integer '.' <commit> column_name { $return = new Persist::Filter::Identifier("$item{table_n +ame}.$item{column_name}") } | column_name { $return = new Persist::Filter::Identifier($item{column_n +ame}) } | <error?> <reject> table_name: name column_name: name literal: string { $return = new Persist::Filter::String($item{string}) } | number { $return = new Persist::Filter::Number($item{number}) } placeholder: '?' { $return = new Persist::Filter::Placeholder('?') } name: /[a-z_][a-z0-9_]*/i string: "'" <commit> character(s) "'" { $return = "'".(join '', + @{$item{'character(s)'}})."'" } | <error?> <reject> integer: /\d+/ number: /[+-]?[0-9]*\.[0-9]+(?:e[+-]?[0-9]+)?/i { $return = lc( +$item[1]) } | /[+-]?[0-9]+\.?(?:e[+-]?[0-9]+)?/i { $return = lc($item[1 +]) } character: "\\\\'" { $return = "\\\\'" } | /[^']/

There is no problem with the grammar itself, but Parse::RecDescent seems to have some problem when I generate the parser in different situations. Specifically, if I use the parser from the command-line like this:

perl -MPersist::Filter=parse_filter,parse_errors -MData::Dumper -e \ "(\$ast = parse_filter(q(namespace = 'http://hanenkamp.com/Contentlet/ +Blank'))) or die(parse_errors); print Dumper(\$ast)"

I have this result:

$VAR1 = bless( [ bless( do{\(my $o = 'namespace')}, 'Persist::Filter:: +Identifier' ), '=', bless( do{\(my $o = '\'http://hanenkamp.com/Contentle +t/Blank\'')}, 'Persist::Filter::String' ) ], 'Persist::Filter::Comparison' );

However, in another project which runs under mod_perl performs some work through Persist::Driver::Memory to parse the exact same string I get:

[Thu Oct 16 11:05:08 2003] [error] [client 127.0.0.1] Unable to create + starter database 'Contentment/Starter/apache-test.pl': Canno t parse filter "namespace = 'http://hanenkamp.com/Contentlet/Blank'": +ERROR: Could not parse filter. in /home/sterling/projects/contentment/Contentment/ApacheDirector/t/. +./../../blib/lib/Contentment/Manager.pm on line 935. Compilation failed in require at (eval 108) line 1.

I've not been able to find a difference between the generated parsers as yet. However, when I set $::RD_TRACE, running from the command-line results in lines starting with Treating "filter:" as a rule declaration and Treating "|" as a new production. Yet, in the mod_perl environment, these same lines all say Treating "" as a rule declaration and Treating "" as a new production.

Furthermore, when the parser is run, I find that a similar nullification of return values is occuring in the mod_perl environment that isn't happening on the command-line.

This is very weird. I'm still working out the details, but has anyone encountered this before and have an idea on where I should look for the problem? I've written several parsers for Parse::RecDescent before and never experienced a problem like this.

My next recourse is to attempt to generate a PM file within each environment and compare the definitions to see if there are any differences. Anyway, when I find the solution, I will be sure to post it here, in case anyone else has a similar problem.

Replies are listed 'Best First'.
Re: Where to find the source of this Parse::RecDescent oddness?
by hanenkamp (Pilgrim) on Oct 17, 2003 at 02:22 UTC

    UPDATE:

    I've generated precompiled versions of both grammars from each environment. I'm assuming (have not checked yet) that Precompile and new generate identical grammar objects. (If someone more familiar with the internals knows whether this assumption holds, I'd be glad to know for certain before digging through the code myself.) The code being generated is identical in both cases--as I expected.

    The place where the differences I can detect appear to come in when the magic $& variable is used to examine the last match. The first disparity in trace output happens in this section of code:

    unless ($text =~ s/\A($skip)/$lastsep=$1 and ""/e and $text =~ s/\A( +?:[a-z_][a-z0-9_]*)//i) { $expectation->failed(); Parse::RecDescent::_trace(q{<<Didn't match terminal>>}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; last; } Parse::RecDescent::_trace( q{>>Matched terminal<< (return value: [} . $& . q{])}, Parse::RecDescent::_tracefirst($text)) if defined $::RD_TRACE; push @item, $item{__PATTERN1__}=$&;

    As far as I can tell by reading the code, the value in $& is being lost somewhere between the unless and the _trace.

    Anyway, still searching...

Re: Where to find the source of this Parse::RecDescent oddness?
by hanenkamp (Pilgrim) on Oct 17, 2003 at 17:06 UTC

    AHAH!

    I've discovered the problem. One module was using Regexp::Fields which results in altered behavior for <cod>$&</code>, which I had not realized. Since Regexp::Fields alters the underlying regular expression engine, it appears to affect every module, even those not using it.

    Unfortunately, this means that Regexp::Fields may be useless to me unless, so I'll have to find another way to implement the other part (or help the author with a patch).