comment on

Monks,

I am trying my hand at parsing data with Parse::RecDescent for the very first time. I managed to cobble together a grammar without too much difficulty, and it successfully parses the example files I have fed to it. So far so good.

I am having some difficulties, however, figuring out how to handle comments. My language allows C-style comments (/* .. */) as well as C++ line comments (// ...)¹. I would like to process these comments in my parser. So setting $Parse::RecDescent::skip to a suitable regex to skip over the comments is not OK (although I have done that for now, just to get going).

The only option I see is to define the rules for a comment²:

comment: c_comment | line_comment

c_comment:
qr!
    /\*             # C-Style comments open with a "/*"...
    (?:             # followed by...
        [^*]        # non-"*" characters
     |  \*(?=[^\/]) # or a "*" and a non-slash character
    )*
    \*/             # ...and closed by a "*/"
!x;

line_comment: qr!//[^\n]*!;
[download]

... and then sprinkle this production liberally throughout the rest of the grammar:

# Yuck!
rule1: comment(s?) subrule1 | comment(s?) subrule2
rule1: comment(s?) subrule3 comment(s?) subrule4
[download]

Can any monks out there lead me down the correct path? While I would expect the above to do what I want, it feels like a hack. Is there a smart way around this?

Thanks for your help.

Update 1: Fixed c-comment regex as per [id://Anonymous Monk|Anony-Monk]'s suggestion.
Update 2: The "big picture":

I am parsing configuration files, with a goal of amending small parts of them while leaving the majority of the file unchanged. So even though comments and whitespace are irrelevant to the semantics of the file, I need to keep track of them so they can be rendered when I print the amended version of the configuration file back out.

So if I were to change "sub-setting1" in the example below from "value1" to "value2", I would want to go from...

setting = (
    sub-setting1 = "value1";
    optimize     = "12"; // unreliable!

    /* foo = "bar"; Commented out, not working right now*/
);
[download]

... to ...

setting = (
    sub-setting1 = "value2";
    optimize     = "12"; // unreliable!

    /* foo = "bar"; Commented out, not working right now*/
);
[download]

The point being, I don't lose the formatting.

¹ Not sure these are the correct terms.
² While it is not the primary purpose of this question, if anybody spots an error in my comment-regex, please let me know!

In reply to Handling Comments with Parse::RecDescent by crashtest

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.