I have always wondered the answer to this question, but never really stuck with it to figure it out. I'm wondering if anyone here can help. Basically, let's say you have this c source code:
/* this is a test * c file /* with an embedded comment */ * in the middle * */ int main(void) { int a; int b; }
Now let's say you want to write a regex that strips out all c comments from the given c file. The way I was thinking about is would be to use non greedy regex to strip out only non-embeeded comments first. And then do multiple passes to ensure you got all comments out.

But then the trick is, I know how to use something like this to do character negation:

$var =~ s/abc[^xyz]*?def//;
That will match stuff like "abcjjjkkklllmmmdef" and make it "abcdef" but it would NOT work if there was a x or y or z in that inside part like "abcjjjkkkxlllmmmdef"

Now back to the comments in C source code. I can't use single character negation. I want to NOT have a two character patern in the middle part.

I've looked at things like negative lookahead or negative lookbehind, but just don't think that works either.

Any regex experts out there that can answer this puzzle?

NOTE: it should also not assume there are no stars in the embedded comment. Or in other words, it should handle this too:

/* this is a test * c file /* with an embedded * multiline * comment */ * in the middle * */ int main(void) { int a; int b; }

Justin Eltoft

"If at all god's gaze upon us falls, its with a mischievous grin, look at him" -- Dave Matthews


In reply to need regex help to strip things like embedded C comments by Eradicatore

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.