loomis53 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to create a grammar and need some guidance. I've got it 'almost' working, but need help with the nested parsing. Below is what i've got so far, w/ the example of what to parse under __DATA__.
#!perl -w use strict; use Parse::RecDescent; $::RD_AUTOACTION = q { [@item[0..$#item]] }; my $grammar = q{ Document : Element(s) Element : Header Body Header : Element_Type Options(?) Element_Type: 'Q::' | 'T::' Options : '(' Arg(s) ')' Arg : /\b(\w+)\s*=\s*(\w+)\s*/xms Open_body : '{' Close_body : '}' Body : Open_body Line(s) Close_body <commit> Line : Element | Text | Text Line Text : /[\s\w]*[^\}]/xms }; my $survey_parser = Parse::RecDescent->new($grammar); undef $/; my $doc = <DATA>; my $tree = $survey_parser->Document($doc); die "no trees" if ! $tree; use Data::Dumper; print Dumper($tree); __DATA__ Q:: (arg1=val arg2=val) { Some text here T:: (arg=value) { Text here } Q:: (argn=etc) { T:: { optional args not included } } }
THX for any help!!!

2006-05-26 Retitled by Arunbear, as per Monastery guidelines
Original title: 'P::RD help'

Replies are listed 'Best First'.
Re: nested parsing with Parse::RecDescent
by philcrow (Priest) on May 26, 2006 at 12:14 UTC
    I doubt that your problem is specifically with P::RD. It is likely with the recursive descent grammar you are feeding to P::RD.

    Grammars are non-trivial. Learning to write them takes time. Thinking in non-binary trees is the key. For help with that, visit a good library, like one at a college, and look for books on compiler design or parsing theory. (I have a copy of the Dragon book whose lead author is Aho. The first chapter is especially helpful in thinking about these things.) These will help you learn to write grammars.

    But, if you want help with your grammar from this site, you'll probably need to provide a bit more information.

    I'll take a little guess, though. Perhaps you are taking about being able to match the Q:: block inside the outer Q:: block. There is nothing preventing you from listing a high level non-terminal again inside it's own definition, so long as the result is unambiguous and does not lead to initial recursion. Consider the following trivial example:

    #!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; $::RD_TRACE = 1; my $grammar = q{ <autotree> program : statement | block statement : IDENT IDENT ';' block : IDENT '{' program '}' IDENT : /[A-J]+/ }; my $parser = Parse::RecDescent->new( $grammar ); my $input = join '', <DATA>; my $tree = $parser->program( $input ); use Data::Dumper; warn Dumper( $tree ); __DATA__ ABC { DEF { GHI ABD; } }
    Note that program is the top level rule, but appears again inside the only valid production for block. I'll leave it for you to turn this into a grammar for your language.

    Note that I used autotree to make an AST for me. That's not always helpful, since its trees have a few too many levels. I also turned on $::RD_TRACE which walks through the parsing, this is all but essential for those of us who work on parsers in spare time. Its output is unbelievably helpful, both in diagnosing errors in a grammar and in teaching you how recursive descent works.

    Phil

      Your assumptions have been right on. I actually figured out my problem just before checking back here. My productions (i think that is the proper term) for Line were not prioritized properly with the greediest first. I also had to modify my regex for Text. My grammar definition now looks like this:
      my $grammar = q{ {my $re_type = qr /.::/ } Document : Element(s) Element : Header Body Header : Element_Type Options(?) Element_Type: /$re_type/ Options : '(' Arg(s) ')' Arg : /\b(\w+)\s*=\s*(\w+)\s*/xms Body : '{' Line(s) '}' <commit> Line : Element | Text Line | Text Text :/[^\{\}]*\n/xms };
      My only concern now is that my current definition for Text is still not exactly right. Instead of matching everything but the braces, I would rather include everything up to, but not including the closing brace or up or a nested Q:: element. I'm sure there's a way to do this, but I'm still working on mastering regex as well. All in good time eh? Anyway, thanks for the advice Philcrow, and also everyone else that offered the helpful tips directly or inderectly related to the problem :)
Re: nested parsing with Parse::RecDescent
by blazar (Canon) on May 26, 2006 at 09:49 UTC
    I am trying to create a grammar and need some guidance. I've got it 'almost' working, but need help with the nested parsing. Below is what i've got so far, w/ the example of what to parse under __DATA__.

    Sorry, I can't help you, because I don't know Parse::RecDescent. However I feel like giving you a suggestion about how to better ask your question. In fact you say that your solution is "'almost' working", but you fail to specify in what exactly it does not work. Also, you give an example of "what to parse" but you fail to specify how you want it to be parsed. Last, had you not used the abbreviation P::RD in the subject (in the body of your post it would have been fine) you would have increased the possibilities that someone knowledgeable with that module would notice it in say NN. All in all your question is better asked than others one can find here, but in any case I point you to How do I post a question effectively?, which may be helpful.

A reply falls below the community's threshold of quality. You may see it by logging in.