Frankly, the way to handle this is with a parser -- something that, in effect, marches through character by character, maintains state information, and gives back chunks of the data with categorizations that you want: comment vs. not-comment. (Since it takes two characters to know you've entered or left a comment, the parser needs to know to look for the second character when it sees the first.)

The state information you need to maintain in this case is the alternation among "not-in-comment-or-quote", "in-quote", and "in-comment". You start out in the first of those, and as soon as you enter either of the others (by detecting an open-quote or open-comment), nothing else matters until you detect the character (pair) that takes you out of that state, putting you back to "not-in-comment-or-quote".

So look at Parse::RecDescent -- I suspect that someone has already come up with a parser spec to handle C-like comments.


In reply to Re: need regex help to strip things like embedded C comments by graff
in thread need regex help to strip things like embedded C comments by Eradicatore

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.