sonic has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking to kill C style (/* */) comments that don't nest.

I'm using the following line:
s/\/\*.*\*\///g; but that's interpreting the "not comment" as part of the comment:

/*comment*/ not comment /* */ Is there a way to make the regex not greedy? Or should I just write a new and much more complex regex?

Replies are listed 'Best First'.
Re: Kill C style comments
by waswas-fng (Curate) on Mar 22, 2004 at 05:54 UTC
    use Regexp::Common; $code_string =~ s/$RE{comment}{C}//g;


    -Waswas
      Is that--Regexp\Common.pm-a standard module? I don't see it on my own system.
Re: Kill C style comments
by jweed (Chaplain) on Mar 22, 2004 at 06:12 UTC
    I highly suggest you pick up a copy of Jeff Friedl's "Mastering Regular Expressions", where this example is discussed in great length. The Final answer is:
    s{(?:"(?:\\.|[^"\\])*")|(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)}{}g
    Which nicely "unrolls the loop" for double-quote-aware comment removal. Pretty slick, huh? Check the book out ASAP!



    Code is (almost) always untested.
    http://www.justicepoetic.net/
      That still won't handle '/*' (single quotes) which is in fact valid C, equal to ('/' << 8) + '*'
Re: Kill C style comments
by eyepopslikeamosquito (Archbishop) on Mar 22, 2004 at 07:06 UTC
    It's discussed in perlfaq6. See the question "How do I use a regular expression to strip C style comments from a file?"

Re: Kill C style comments
by pbeckingham (Parson) on Mar 22, 2004 at 05:53 UTC

    You could try:

    s/\/\*.*?\*\///g;

    Which stops the regex being greedy. Don't forget that your regex ignores comment characters that occur within strings.

      Ah yeah, thanks pbeckingham, I thought that might be too simple.
Re: Kill C style comments
by flyingmoose (Priest) on Mar 22, 2004 at 13:20 UTC
    s/\/\*.*\*\///g;

    Slightly OT, but when your regex is going to contain a lot of backslashes, I find it is a lot easier to read if you change the regex delimiter to something other than '/', like maybe '#' or '!'. Also you can use character classes (square brackets) around characters like '/' to help out some too. I can't stand reading regexes that contain escaped slashes and backslashes that still use '/' as the delimiter. They can cause brain damage.

    But yeah, if Regex::Common or something already posted here can deal with it, good deal.