good chemistry is complicated,
and a little bit messy -LW
Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python)by aufflick (Deacon)
|on Jan 20, 2016 at 23:29 UTC||Need Help??|
aufflick has asked for the wisdom of the Perl Monks concerning the following question:
Q1: what's the policy on StackOverflow cross posts? I've been away from PM for a number of years, so I'm a little out of date on things :)
I'm trying to parse HAML (haml.info) with Parse::RecDescent. If you don't know haml, the problem in question is the same as parsing Python - blocks of syntax are grouped by the indentation level.
Starting with a very simple subset, I've tried a few approaches but I think I don't quite understand either the greediness or recursive order of P::RD. Given the haml:
The simplest grammar I have that I think should work is (with bits unnecessary for the above snippet):
The problem is in the block definition. As above, it does not capture any of the text, though it does capture the following correctly:
If I remove the second reject line from the above (the one on the first block rule) then it does capture everything, but of course incorrectly grouped since the first block will slurp all lines, irrespective of indentation.
I've also tried using lookahead actions to inspect $text and a few other approaches with no luck.
Can anyone (a) explain why the above doesn't work and/or (b) if there's an approach without using perl actions/rejects? I tried grabbing the number of spaces in the indent, and then using that in an interpolated lookahead condition for the number of spaces in the next line, but I could never quite get the interpolation syntax right (since it requires an arrow operator).