Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hi all,

Q1: what's the policy on StackOverflow cross posts? I've been away from PM for a number of years, so I'm a little out of date on things :)

SO xpost link: http://stackoverflow.com/questions/34889351/whitespace-important-parsing-with-parserecdescent-eg-haml-python

Actual Q:

I'm trying to parse HAML (haml.info) with Parse::RecDescent. If you don't know haml, the problem in question is the same as parsing Python - blocks of syntax are grouped by the indentation level.

Starting with a very simple subset, I've tried a few approaches but I think I don't quite understand either the greediness or recursive order of P::RD. Given the haml:

%p %span foo

The simplest grammar I have that I think should work is (with bits unnecessary for the above snippet):

<autotree> startrule : <skip:''> block(s?) non_space : /[^ ]/ space : ' ' indent : space(s?) indented_line : indent line indented_lines : indented_line(s) <reject: do { Perl6::Junction:: +any(map { $_->level } @{$item[1]}) != $item[1][0]->level }> block : indented_line block <reject: do { $item[2]->leve +l <= $item[1]->level }> | indented_lines line : single_line | multiple_lines single_line : line_head space line_body newline | line_head sp +ace(s?) newline | plain_text newline # ALL subsequent lines ending in | are consumed multiple_lines : line_head space line_body continuation_marker ne +wline continuation_line(s) continuation_marker : space(s) '|' space(s?) continuation_line : space(s?) line_body continuation_marker newline : "\n" line_head : haml_comment | html_element haml_comment : '-#' html_element : '%' tag # TODO: xhtml tags technically allow unicode tag_start_char : /[:_a-z]/i tag_char : /[-:_a-z.0-9]/i tag : tag_start_char tag_char(s?) line_body : /.*/ plain_text : backslash ('%' | '!' | '.' | '#' | '-' | '/' | '=' | '& +' | ':' | '~') /.*/ | /.*/ backslash : '\\'

The problem is in the block definition. As above, it does not capture any of the text, though it does capture the following correctly:

-# haml comment %p a paragraph

If I remove the second reject line from the above (the one on the first block rule) then it does capture everything, but of course incorrectly grouped since the first block will slurp all lines, irrespective of indentation.

I've also tried using lookahead actions to inspect $text and a few other approaches with no luck.

Can anyone (a) explain why the above doesn't work and/or (b) if there's an approach without using perl actions/rejects? I tried grabbing the number of spaces in the indent, and then using that in an interpolated lookahead condition for the number of spaces in the next line, but I could never quite get the interpolation syntax right (since it requires an arrow operator).


In reply to Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python) by aufflick

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2023-12-05 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (25 votes). Check out past polls.

    Notices?