Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

PRD parser problem: How to deal with mutiple lines

by Hanken (Acolyte)
on Jun 13, 2008 at 02:31 UTC ( [id://691835]=perlquestion: print w/replies, xml ) Need Help??

Hanken has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks, See my below simple script. I got to get the data of the section content and I failed to do it when it comes to the multiple lines which are supposed:
DESCRIPTION = { UK BANK; UN BANK; };
The section content is what between section name and section end.

#! /usr/local/bin/perl -sw BEGIN { close STDERR and open STDERR, '>./STDERR' or die $!; } use Parse::RecDescent; #use strict; #use warnings; $::RD_TRACE = 1; $::RD_HINT = 1; #============================================ # GRAMMAR DEFINITION HERE #============================================ my $grammar = q{ Para: <skip: qr{ (?> \s | /\* .*? \*/ | //[^\n]* )* }sx> List(s) /\Z/ |{ use Data::Dumper 'Dumper'; print "$_->[0]\n" for @{$thisparser->{errors}}; exit; } List: 'SECTION_START' SECTION_NAME SECTION_CONTENT(s?) SECTION_END |<error> SECTION_NAME: /\w+/ {print "section name is $item[1]\n";} |<error> SECTION_CONTENT: # HOW TO DO IT??? /.*/ ...SECTION_END ...!SECTION_START {print "section content is $item[1]\n";} |<error> SECTION_END: 'SECTION_END' {print "section END is $item[1]\n";} |<error> }; #============================================ # MAIN PROGRAM STARTS HERE #============================================ my $parse = new Parse::RecDescent ($grammar); my $text = do { local $/; <DATA> }; $parse->Para($text); __DATA__ /***********************************/ /* comment1 */ /***********************************/ SECTION_START BANK001 /* be there */ DESCRIPTION = "US BANK"; SECTION_END /********************************/ /* comment2*/ /********************************/ SECTION_START BANK002 /* no DESCRIPTION here!!! */ SECTION_END /************************************/ /* comment3 */ /************************************/ SECTION_START BANK003 //mutiple-line DESCRIPTION DESCRIPTION = { UK BANK; UN BANK; }; SECTION_END

Replies are listed 'Best First'.
Re: PRD parser problem: How to deal with mutiple lines
by philcrow (Priest) on Jun 13, 2008 at 13:31 UTC
    This is about choices. From the examples, there are three possible section contents: single line description, description with a brace block, and blank. That leads to three productions (including blank which is not an <error>).

    You could start that like:

    section_content: description |
    Then define a description
    description: 'DESCRIPTION' '=' statement | 'DESCRIPTION' '=' '{' statement(s) '}'
    Finally a statement
    statement: ...

    The key is to think: what are the choices for a valid description or statement or whatever? Each choice is an alternative. Each alternative is made of pieces which themselves might have choices.

    p.s. Normal conventions of grammars have us use upper case on the left side of a rule only if we are defining a token. Other left sides, which are built from other things, are usually lower case.

    Phil

    The Gantry Web Framework Book is now available.
      Hi, Philcrow, thanks for your reply! Now I changed my grammar as following:
      List: 'SECTION_START' SECTION_NAME SECTION_CONTENT 'SECTION_END' |<error> SECTION_NAME: /\w+/ {print "section name is $item[1]\n";} |<error> SECTION_CONTENT: Description | Description: 'DESCRIPTION' '=' Statement |'DESCRIPTION' '=' '{' Statement(s?) '}' ';' Statement: /.+\n/
      But I still can not parse the multiple-lined contents inside the description brackets.
      section name is BANK001 section name is BANK002 section name is BANK003 Invalid List: Was expecting 'SECTION END' but found "UK BANK; " instea +d
      How can I do to solve it?
        I think your statement rule needs a trailing semi-colon. If you have further problems you should trace the execution. Do this by adding these statements to the program before constructing the parser:
        $::RD_TRACE = 1; $::RD_HINT = 1;
        The first one is actually the tracer. Be warned that it will generate a lot of output. Reading it will help you understand what the parser is doing and will probably lead to the error.

        Phil

        The Gantry Web Framework Book is now available.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://691835]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-23 20:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found