Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Is the skip: directive broken in Parse::RecDescent ? [Solved - PEBKAC]by Hercynium (Hermit) |
on Aug 11, 2008 at 21:21 UTC ( [id://703751]=perlquestion: print w/replies, xml ) | Need Help?? |
Hercynium has asked for the wisdom of the Perl Monks concerning the following question:
So, I've been happily learning how to use grammars for parsing with Parse::RecDescent, and I've been very pleased with it's power and flexibility so far... but I'm stumbling over a problem that for the life of me, I can't understand why it's happening! I highly doubt that this could be a bug in PRD - it's used by too many people... but even the most bare code is demonstrating this frustrating problem: Basically, it's this: Changing the prefix pattern has NO effect! If I print out $skip it shows that it is set as expected, but the behavior of PRD does not change from the default. This happens whether I am using a skip: directive, setting $skip from within an Action, or setting $Parse::RecDescent::skip from outside the grammar code. Here's a little demonstration of what I'm getting... Code like this:
Outputs this:
I'm pretty certain it's not a problem with the regexes I'm using because when I do something like this instead:
I get this output:
Update:As I suspected, the "skip" or "terminal prefix" functionality is *not* broken... but it is not quite as DWIMmy as I was expecting with regards to how the regular expression specified is used.I still don't think I understand the subtle details, but as far as I can tell, one should keep in mind that the skip regex (aka terminal prefix), is matched ONLY ONCE. Therefore, one probably should surround the whole thing with a parenthesis and asterisk to ensure *everything* one wants to skip will be consumed in *one pass* To further show what I mean, here is one of the many non-working regexes that brought me here: /(?: \# .*? \n? | \s* )?/msx It will match only ONE INSTANCE of a comment or repeated whitespace. My example text has several adjoining instances of comments and whitespace, and only the first match was being consumed! Here is the regex that does what I want: /(?: \# .*? \n | \s )*/msx As you can see, it consumes ALL Comments AND whitespace until nothing matches. SMALL change, BIG difference! I now have this working the way I want, by assigning it to $skip in the "start-up actions": $skip = '(?msx: \# .*? \n | \s )*' This has been another fun and edifying expedition, and if anyone reading this has any additional questions, I am happy to share whatever meager knowledge I have gained :)
Back to
Seekers of Perl Wisdom
|
|