zuma53 has asked for the wisdom of the Perl Monks concerning the following question:

Hi--

I'm working on a grammar to parse SQL, and here's some of what I've so far...

my $grammar = q { select : /SELECT/i select_list /FROM/i table_source select_list : column(s /,/) column : identifier /=/ col_name | col_name /AS/i alias | col_name alias(?) | string | number table_source : table alias(?) with_hint(?) table_join(s?) | table alias(?) table_join(s) | table alias(?) /,/ table_list | table alias(?) yada...yada...

And something simple to parse:

select colA, colB, colC from customer x

The problem I'm having is that when RecDescent gets to colC, it matches "from" as an **alias** instead of being a token on the **start** production. ColA and ColB work fine, as the comma effective terminates a column. If I add a column alias to ColC, everything works fine. So it seems that "from" is sent down to select_list->column and matching there...and not where I had intended.

Any ideas as to how I can fix this? Do I explicitly look for "col_name /from/i" in the column production and then reject it if it matches (and if so what would that syntax be?), or should my grammar structured differently?

Let me know if you want the entire script and not just the snippet I've posted.

Thanks!

Replies are listed 'Best First'.
Re: Parse::RecDescent greedy matches
by tobyink (Canon) on Jul 25, 2012 at 06:28 UTC

    "I'm working on a grammar to parse SQL"

    Any particular reason? Is this just a learning exercise, or are you doing it because you actually need to parse some SQL? If the latter, stop doing what you're doing, and use SQL::Statement instead.

    If you're just trying to learn PRD, then here's a hint... alias is probably defined something like this:

    alias: identifier identifier: /.../ # some regex

    To the production for identifier you need to add an action which rejects illegal identifiers:

    alias: identifier identifier: /.../ # some regex { if ($item{identifier} =~ /^(?:select|insert|from|as)$/i) { undef } else { 1 } }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      OK, thanks. I will give that a try.

      I couldn't discern how to alter the flow-of-control in RD; certainly this will help.

      I'm trying to parse MSSQL-specific queries with all its funky quirks to separate out columns/tables from the SQL verbiage.

        One more question on RD:

        If I have a production like this:

         foo : /tag/ '(' <things in parens> ')'

        I'd like to throw away all of the text between nested parens. I tried /.*/ but this overshoots and gobbles up the rest of the line.

        Any suggestions on how to approach this? Thanks.