in reply to nested parsing with Parse::RecDescent

I doubt that your problem is specifically with P::RD. It is likely with the recursive descent grammar you are feeding to P::RD.

Grammars are non-trivial. Learning to write them takes time. Thinking in non-binary trees is the key. For help with that, visit a good library, like one at a college, and look for books on compiler design or parsing theory. (I have a copy of the Dragon book whose lead author is Aho. The first chapter is especially helpful in thinking about these things.) These will help you learn to write grammars.

But, if you want help with your grammar from this site, you'll probably need to provide a bit more information.

I'll take a little guess, though. Perhaps you are taking about being able to match the Q:: block inside the outer Q:: block. There is nothing preventing you from listing a high level non-terminal again inside it's own definition, so long as the result is unambiguous and does not lead to initial recursion. Consider the following trivial example:

#!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; $::RD_TRACE = 1; my $grammar = q{ <autotree> program : statement | block statement : IDENT IDENT ';' block : IDENT '{' program '}' IDENT : /[A-J]+/ }; my $parser = Parse::RecDescent->new( $grammar ); my $input = join '', <DATA>; my $tree = $parser->program( $input ); use Data::Dumper; warn Dumper( $tree ); __DATA__ ABC { DEF { GHI ABD; } }
Note that program is the top level rule, but appears again inside the only valid production for block. I'll leave it for you to turn this into a grammar for your language.

Note that I used autotree to make an AST for me. That's not always helpful, since its trees have a few too many levels. I also turned on $::RD_TRACE which walks through the parsing, this is all but essential for those of us who work on parsers in spare time. Its output is unbelievably helpful, both in diagnosing errors in a grammar and in teaching you how recursive descent works.

Phil

Replies are listed 'Best First'.
Re^2: nested parsing with Parse::RecDescent
by loomis53 (Novice) on May 26, 2006 at 15:04 UTC
    Your assumptions have been right on. I actually figured out my problem just before checking back here. My productions (i think that is the proper term) for Line were not prioritized properly with the greediest first. I also had to modify my regex for Text. My grammar definition now looks like this:
    my $grammar = q{ {my $re_type = qr /.::/ } Document : Element(s) Element : Header Body Header : Element_Type Options(?) Element_Type: /$re_type/ Options : '(' Arg(s) ')' Arg : /\b(\w+)\s*=\s*(\w+)\s*/xms Body : '{' Line(s) '}' <commit> Line : Element | Text Line | Text Text :/[^\{\}]*\n/xms };
    My only concern now is that my current definition for Text is still not exactly right. Instead of matching everything but the braces, I would rather include everything up to, but not including the closing brace or up or a nested Q:: element. I'm sure there's a way to do this, but I'm still working on mastering regex as well. All in good time eh? Anyway, thanks for the advice Philcrow, and also everyone else that offered the helpful tips directly or inderectly related to the problem :)