Monks,

I am trying my hand at parsing data with Parse::RecDescent for the very first time. I managed to cobble together a grammar without too much difficulty, and it successfully parses the example files I have fed to it. So far so good.

I am having some difficulties, however, figuring out how to handle comments. My language allows C-style comments (/* .. */) as well as C++ line comments (// ...)1. I would like to process these comments in my parser. So setting $Parse::RecDescent::skip to a suitable regex to skip over the comments is not OK (although I have done that for now, just to get going).

The only option I see is to define the rules for a comment2:

comment: c_comment | line_comment c_comment: qr! /\* # C-Style comments open with a "/*"... (?: # followed by... [^*] # non-"*" characters | \*(?=[^\/]) # or a "*" and a non-slash character )* \*/ # ...and closed by a "*/" !x; line_comment: qr!//[^\n]*!;
... and then sprinkle this production liberally throughout the rest of the grammar:
# Yuck! rule1: comment(s?) subrule1 | comment(s?) subrule2 rule1: comment(s?) subrule3 comment(s?) subrule4

Can any monks out there lead me down the correct path? While I would expect the above to do what I want, it feels like a hack. Is there a smart way around this?

Thanks for your help.

Update 1: Fixed c-comment regex as per [id://Anonymous Monk|Anony-Monk]'s suggestion.
Update 2: The "big picture":

I am parsing configuration files, with a goal of amending small parts of them while leaving the majority of the file unchanged. So even though comments and whitespace are irrelevant to the semantics of the file, I need to keep track of them so they can be rendered when I print the amended version of the configuration file back out.

So if I were to change "sub-setting1" in the example below from "value1" to "value2", I would want to go from...

setting = ( sub-setting1 = "value1"; optimize = "12"; // unreliable! /* foo = "bar"; Commented out, not working right now*/ );
... to ...
setting = ( sub-setting1 = "value2"; optimize = "12"; // unreliable! /* foo = "bar"; Commented out, not working right now*/ );
The point being, I don't lose the formatting.

1 Not sure these are the correct terms.
2 While it is not the primary purpose of this question, if anybody spots an error in my comment-regex, please let me know!


In reply to Handling Comments with Parse::RecDescent by crashtest

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.