PRD parser problem: How to deal with mutiple lines

Hanken has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks, See my below simple script. I got to get the data of the section content and I failed to do it when it comes to the multiple lines which are supposed:

DESCRIPTION = {
    UK BANK;
    UN BANK;
};
[download]

The section content is what between section name and section end.

#! /usr/local/bin/perl -sw

BEGIN {
        close STDERR and open STDERR, '>./STDERR' or die $!;
}
use Parse::RecDescent;
#use strict;
#use warnings;

$::RD_TRACE = 1;
$::RD_HINT  = 1;


#============================================
# GRAMMAR DEFINITION HERE
#============================================
my $grammar =
q{
    Para: <skip:
                qr{
                    (?> \s
                    | /\* .*? \*/
                    | //[^\n]*
                    )*
          }sx> List(s) /\Z/
                |{ use Data::Dumper 'Dumper';
                    print "$_->[0]\n" for @{$thisparser->{errors}};
                    exit;
                 }
    
    List:       'SECTION_START'
                SECTION_NAME
                SECTION_CONTENT(s?)
                SECTION_END
                |<error>
    
    SECTION_NAME:
                /\w+/ {print "section name is $item[1]\n";}
                |<error>
                
    SECTION_CONTENT: 
                # HOW TO DO IT???
                /.*/ ...SECTION_END ...!SECTION_START
                    {print "section content is $item[1]\n";}
                |<error>
                
    SECTION_END:
                'SECTION_END' {print "section END is $item[1]\n";}
                |<error>
    
};


#============================================
# MAIN PROGRAM STARTS HERE
#============================================

my $parse = new Parse::RecDescent ($grammar);

my $text = do { local $/; <DATA> };
$parse->Para($text);


__DATA__
/***********************************/
/* comment1    */
/***********************************/
SECTION_START   BANK001 /* be there */
        DESCRIPTION = "US BANK";
SECTION_END

/********************************/
/* comment2*/
/********************************/
SECTION_START BANK002    /* no DESCRIPTION 
                           here!!! */

SECTION_END

/************************************/
/* comment3    */
/************************************/
SECTION_START BANK003    //mutiple-line DESCRIPTION
DESCRIPTION = {
    UK BANK;
    UN BANK;
};
SECTION_END
[download]

Comment on PRD parser problem: How to deal with mutiple lines Select or Download Code

Replies are listed 'Best First'.
Re: PRD parser problem: How to deal with mutiple lines by philcrow (Priest) on Jun 13, 2008 at 13:31 UTC
This is about choices. From the examples, there are three possible section contents: single line description, description with a brace block, and blank. That leads to three productions (including blank which is not an <error>). You could start that like: `section_content: description \|` [download] Then define a description `description: 'DESCRIPTION' '=' statement \| 'DESCRIPTION' '=' '{' statement(s) '}'` [download] Finally a statement `statement: ...` [download] The key is to think: what are the choices for a valid description or statement or whatever? Each choice is an alternative. Each alternative is made of pieces which themselves might have choices. p.s. Normal conventions of grammars have us use upper case on the left side of a rule only if we are defining a token. Other left sides, which are built from other things, are usually lower case. Phil The Gantry Web Framework Book is now available.	[reply] [d/l] [select]
Re^2: PRD parser problem: How to deal with mutiple lines by Hanken (Acolyte) on Jun 16, 2008 at 08:47 UTC
Hi, Philcrow, thanks for your reply! Now I changed my grammar as following: `List: 'SECTION_START' SECTION_NAME SECTION_CONTENT 'SECTION_END' \|<error> SECTION_NAME: /\w+/ {print "section name is $item[1]\n";} \|<error> SECTION_CONTENT: Description \| Description: 'DESCRIPTION' '=' Statement \|'DESCRIPTION' '=' '{' Statement(s?) '}' ';' Statement: /.+\n/` [download] But I still can not parse the multiple-lined contents inside the description brackets. `section name is BANK001 section name is BANK002 section name is BANK003 Invalid List: Was expecting 'SECTION END' but found "UK BANK; " instea +d` [download] How can I do to solve it?	[reply] [d/l] [select]
Re^3: PRD parser problem: How to deal with mutiple lines by philcrow (Priest) on Jun 17, 2008 at 14:53 UTC
I think your statement rule needs a trailing semi-colon. If you have further problems you should trace the execution. Do this by adding these statements to the program before constructing the parser: `$::RD_TRACE = 1; $::RD_HINT = 1;` [download] The first one is actually the tracer. Be warned that it will generate a lot of output. Reading it will help you understand what the parser is doing and will probably lead to the error. Phil The Gantry Web Framework Book is now available.	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks