aufflick has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
Q1: what's the policy on StackOverflow cross posts? I've been away from PM for a number of years, so I'm a little out of date on things :)
SO xpost link: http://stackoverflow.com/questions/34889351/whitespace-important-parsing-with-parserecdescent-eg-haml-python
Actual Q:
I'm trying to parse HAML (haml.info) with Parse::RecDescent. If you don't know haml, the problem in question is the same as parsing Python - blocks of syntax are grouped by the indentation level.
Starting with a very simple subset, I've tried a few approaches but I think I don't quite understand either the greediness or recursive order of P::RD. Given the haml:
%p %span foo
The simplest grammar I have that I think should work is (with bits unnecessary for the above snippet):
<autotree> startrule : <skip:''> block(s?) non_space : /[^ ]/ space : ' ' indent : space(s?) indented_line : indent line indented_lines : indented_line(s) <reject: do { Perl6::Junction:: +any(map { $_->level } @{$item[1]}) != $item[1][0]->level }> block : indented_line block <reject: do { $item[2]->leve +l <= $item[1]->level }> | indented_lines line : single_line | multiple_lines single_line : line_head space line_body newline | line_head sp +ace(s?) newline | plain_text newline # ALL subsequent lines ending in | are consumed multiple_lines : line_head space line_body continuation_marker ne +wline continuation_line(s) continuation_marker : space(s) '|' space(s?) continuation_line : space(s?) line_body continuation_marker newline : "\n" line_head : haml_comment | html_element haml_comment : '-#' html_element : '%' tag # TODO: xhtml tags technically allow unicode tag_start_char : /[:_a-z]/i tag_char : /[-:_a-z.0-9]/i tag : tag_start_char tag_char(s?) line_body : /.*/ plain_text : backslash ('%' | '!' | '.' | '#' | '-' | '/' | '=' | '& +' | ':' | '~') /.*/ | /.*/ backslash : '\\'
The problem is in the block definition. As above, it does not capture any of the text, though it does capture the following correctly:
-# haml comment %p a paragraph
If I remove the second reject line from the above (the one on the first block rule) then it does capture everything, but of course incorrectly grouped since the first block will slurp all lines, irrespective of indentation.
I've also tried using lookahead actions to inspect $text and a few other approaches with no luck.
Can anyone (a) explain why the above doesn't work and/or (b) if there's an approach without using perl actions/rejects? I tried grabbing the number of spaces in the indent, and then using that in an interpolated lookahead condition for the number of spaces in the next line, but I could never quite get the interpolation syntax right (since it requires an arrow operator).
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python)
by Anonymous Monk on Jan 21, 2016 at 00:22 UTC | |
by aufflick (Deacon) on Jan 21, 2016 at 02:02 UTC | |
by aufflick (Deacon) on Jan 21, 2016 at 01:59 UTC | |
Re: Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python)
by Anonymous Monk on Jan 21, 2016 at 00:10 UTC | |
Re: Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python) [SO crosspost - sorry!]
by stevieb (Canon) on Jan 21, 2016 at 00:14 UTC |