dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a parser for the very first time and I can't seem to resolve a reduce/reduce conflict the elegant way I want to use. (Yes, I'm writing my own SQL parser. Yes, I know there's modules that do this. Yes, I have a reason to do so.)
database_ident: ident ; table_ident: database_ident '.' ident | ident ; column_ident: table_ident '.' ident | ident ; ident: IDENT | IDENT_QUOTED ;

Parse::Yapp is complaining about 2 reduce/reduce conflicts whenever I try to use column_ident. If I manually expand table_ident and database_ident in the dependent rules, the reduce/reduce conflicts go away. But, I don't want to do that because I anticipate extending those rules over time and don't want to have the rule in multiple places. Does anyone have any suggestions?


My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Replies are listed 'Best First'.
Re: (OT) Yapp rules and resolving reduce/reduce errors
by pc88mxer (Vicar) on Apr 10, 2008 at 19:33 UTC
    I've run across problems like this when writing parsers. The way I would do it is to liberalize the grammer and leave the checking of the ident to a later processing stage. For instance, I would probably use something like:
    ident : IDENT | INDENT_QUOTED | ident '.' IDENT | ident '.' IDENT_QUOT +ED ;
    and not have the grammar distinguish between table_ident, column_ident and database_ident. The checking of the proper kind of ident can be done later. This approach also generally makes it easier to recover from errors.

    Update: The reduce-reduce conflict is this: when trying to parse a column_ident and the current input token is ident with look-ahead of ., the following reductions are possible:

    database_ident -> ident # i.e. column_ident -> ident '.' ident '.' i +dent table_ident -> ident # i.e. column_ident -> ident '.' ident
    One solution would be to reverse the recursion:
    database_ident: ident ; table_ident: ident | ident '.' database_ident ; column_ident: ident | ident '.' table_indent ;
    Then when the look-ahead is '.' the grammar knows exactly what production to try to complete.
Re: (OT) Yapp rules and resolving reduce/reduce errors
by mr_mischief (Monsignor) on Apr 10, 2008 at 19:46 UTC
    Your problem, which I'm sure you've figured out, is that you have column_ident and table_ident reducing from the same token stream ( 'ident . ident' ).

    One way to deal with this is in a shift-reduce parser to break up your grammar into different nonterminals. Starting from the statement level down might help. Combining like nonterminals might, too.

    table_and_column: ident '.' ident | ident ;

    Then your statement can look something like:

    statement: stmt_type from_into table_and_column set table_and_column where | stmt_type from_into table_and_column ... ;
    (with suitable null options).

    There are open source and I think even public domain grammars for SQL. They should be available in both LL and LR. Perhaps you could start with one and translate it into YAPP instead of creating the grammar from scratch?

    One such SQL grammar, written in the EBNF notation of the Gold Parsing System, is apparently Ansi 89 compliant.

Re: (OT) Yapp rules and resolving reduce/reduce errors
by ikegami (Patriarch) on Apr 10, 2008 at 23:35 UTC
    The problem is that column_ident can match ident '.' ident in two different ways:
    • column_ident table_ident '.' ident database_ident '.' ident ident '.' ident
    • column_ident table_ident '.' ident ident '.' ident

    pc88mxer's solution to defer identifier validation until later is sound, but here's one that does it during parsing:

    database_ident: ident ; table_ident: ident '.' ident | ident ; column_ident: ident '.' ident '.' ident | ident '.' ident | ident ;
      That's the solution I'm using, but it means that any change to how database_ident is defined needs to be propagated by hand. Isn't there a way of saying "In the case of a reduce/reduce conflict, resolve in this fashion." ?

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        No idea. I don't know anything about Yapp (or yacc, for that matter). If you really think database_ident is likely to change, you could go with:

        database_ident: ident ; table_ident: ident ; column_ident: ident ; qualified_database_ident: database_ident ; qualified_table_ident: database_ident '.' table_ident | table_ident ; qualified_column_ident: database_ident '.' table_ident '.' column_ident | table_ident '.' column_ident | column_ident ;
Re: (OT) Yapp rules and resolving reduce/reduce errors
by casiano (Pilgrim) on Apr 11, 2008 at 13:04 UTC
    Dear Dragonchild,

    An alternative subject to your design constraints (no expanding) is to move the conflict from reduce-reduce to shift-reduce and solve it using priorities:

    %start column_ident %left DUMMY %right '.' %% column_ident: table_ident '.' ident | table_ident ; table_ident: ident '.' database_ident | ident %prec DUMMY ; database_ident: ident ; ident: IDENT | IDENT_QUOTED ; %%
    Now, when I compile with Parse::Eyapp I have no conflicts

    pl@nereida:~/LEyapp/examples$ eyapp -v dc2.eyp pl@nereida:~/LEyapp/examples$
    I have written Parse::Eyapp that expands Parse::Yapp with capabilities similar to Parse::RecDescent. Consider using it.
    Hope it will help

    Best

    Casiano

Re: (OT) Yapp rules and resolving reduce/reduce errors
by casiano (Pilgrim) on Apr 12, 2008 at 09:39 UTC
    Mmmm ... In the previous solution above I changed your left recursive rule for table_ident by a right recursive rule. If you don't like this, you can use this alternative solution:
    pl@nereida:~/LEyapp/examples$ cat -n dc3.eyp 1 %start column_ident 2 3 %left DUMMY 4 %right '.' 5 6 %% 7 column_ident: 8 table_ident '.' ident 9 | table_ident 10 ; 11 12 table_ident: 13 database_ident '.' ident 14 | database_ident %prec DUMMY 15 ; 16 17 database_ident: 18 ident 19 ; 20 21 ident: 22 IDENT 23 | IDENT_QUOTED 24 ; 25 %%
    This solution also avoided conflicts: The only shift-reduce conflict is solved in favor of "shift" (that I supppose is what you need here):
    pl@nereida:~/LEyapp/examples$ eyapp -v dc3.eyp pl@nereida:~/LEyapp/examples$ head -15 dc3.output Conflicts: ---------- Conflict in state 1 between rule 4 and token '.' resolved as shift. Rules: ------ 0: $start -> column_ident $end 1: column_ident -> table_ident '.' ident 2: column_ident -> table_ident 3: table_ident -> database_ident '.' ident 4: table_ident -> database_ident 5: database_ident -> ident 6: ident -> IDENT 7: ident -> IDENT_QUOTED
    Best

    Casiano