prassi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monk,

I need to remove the C style comment which is like this below is the example for main.c

/*This is the file which is used for the developing the code for multiple IO operation Author : Grey*/ void main() { int a, b; int d; //This is the temp copy buffer. a = 10; b = 11;//This is the temp value assigned to the variables c = a+b; d = c+1; /*I am copying this with a increment*/ }

I was able to remove all the comment but couldn't remove the first 3 lines of the comment using the regex can you please help me in this

Regards, -Prasad

Replies are listed 'Best First'.
Re: C style multiple line comment removal
by davido (Cardinal) on Jun 08, 2012 at 09:43 UTC

    Ok... You're unnecessarily (and possibly mistakenly) reading from the input filehandle twice; once in line mode, and once in slurp mode. The first read is removing a single line from the top of the C code, but if you're trying to be that specific in how you remove the comment you may as well just open up an editor and clip the first line.

    I think you're looking for a general solution that will remove C style comments from source code.

    I didn't even try to unravel the regular expression you posted. (Update: It looks like the one in perlfaq6.) The problem of removing C style comments from source code can be simpler than that (with a possible killer exception discussed below), because C style comments have the oft-hated feature of not being able to be nested. In other words, you can't do this with C style comments:

    /* This is a comment /* and this is a nested one */ */

    ...because C sees it as:

    /* This is a comment /* and this is a nested one */

    ...with a trailing unmatched */, which is a syntax error. That's unfortunate for people who like to comment out blocks of code that may also contain comments, without resorting to preprocessor directives. But for you, today, it's a great feature. It means you don't have to keep track of state. You don't need to push comment brackets onto a stack, parse parse parse, pop them off as closing brackets are seen, and so on. Today is a good day for you.

    Consider the following code snippet:

    local $/ = undef; $_ = <DATA>; s{ /\* .*? \*/ }{}gsmx; print; __DATA__ /*This is the file which is used for the developing the code for multiple IO operation Author : Grey*/int myfunc();/* remove */ void main(); /*This is another comment */ void main() { int a, b; int d; //This is the temp copy buffer. a = 10; b = 11;//This is the temp value assigned to the variables c = a+b; d = c+1; /*I am copying this with a increment*/ }

    This will produce the following output:

    int myfunc(); void main(); void main() { int a, b; int d; //This is the temp copy buffer. a = 10; b = 11;//This is the temp value assigned to the variables c = a+b; d = c+1; }

    It's a lucky day when non-greedy matching actually works like you want. There's still a problem though, so the day may not be so lucky after all. The problem is explained in this article: A /* token that is commented out with a // style comment will still be picked up by the regular expression, resulting in whatever comes after being dropped until another */ is found. That may be a problem for you in your source code, or it may not, but it's kind of a risk.

    Unfortunately, that same flaw exists in Regexp::Common:

    use Regexp::Common qw( comment ); local $/ = undef; $_ = <DATA>; s/$RE{comment}{C}//gs; print; __DATA__ /*This is the file which is used for the developing the code for multiple IO operation Author : Grey*/int myfunc();/* remove */ void main(); /*This is another comment */ void main() { int a, b; int d; //This is the temp copy buffer. /* This breaks regex solutions a = 10; b = 11;//This is the temp value assigned to the variables c = a+b; d = c+1; /*I am copying this with a increment*/ }

    If your source contains C comments in C++ style comments, you've got a problem that is better served with a proper lexer and parser.

    Now please help yourself out by reading perlintro.

    Update: The regexp you posted skirts the issue of having a /* embedded within a // (C++ style) comment by just removing both the C style, and the C++ style comments. If it was your intent to remove both C style and C++ style, I wish I had known so I could have saved myself some research. ...at least I found a bug in Regexp::Common that I will report. I wish I could provide a patch at the same time but I haven't figured out a pure regex solution yet.


    Dave

Re: C style multiple line comment removal
by toolic (Bishop) on Jun 08, 2012 at 11:08 UTC
Re: C style multiple line comment removal
by ckj (Chaplain) on Jun 08, 2012 at 08:04 UTC
    Well, if you're reading the file main.c and then doing the regex operation then I will go with this:
    $str = <FILEHANDLE>; while($str=~s/\/\*(.*?)\*\///gixm){ print $str; }

      This method dint remove the comment of the first 3 lines of the code. I have tried this, as I can read only single line from the file removal from the 2nd line is a problem

Re: C style multiple line comment removal
by Anonymous Monk on Jun 08, 2012 at 08:16 UTC

    I was able to remove all the comment but couldn't remove the first 3 lines of the comment using the regex can you please help me

    Show your code :)

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: C style multiple line comment removal
by locked_user sundialsvc4 (Abbot) on Jun 08, 2012 at 14:16 UTC

    As an aside, consider how the “usual” text-file I/O handlers will themselves look for “newline” delimiters in order to parse out the incoming file as “individual strings” for you ... and that maybe you don’t actually want to do this.

    It is probably fairly reasonable to assume that all of the files that you are dealing with are, in fact, of a perfectly reasonable size to simply slurp into memory all at once.   (After all, text editors do that.)   Hence, you can use Perl to read the entire file contents into one string variable ... which will of course therefore contain the newline sequences ... which in this case you don’t care about and specifically want to elide out.   Now, your regular expression can simply search for the /*...*/ sequence, in what is now a single string irrespective of whether or not there are any newline-characters therein.   You use the /g modifier (et al...) in order to apply the regular expression repetitively to the string.

    If your purpose is simply to remove the comments, you can also simply use regular expressions to replace each matching comment-entry with an empty string, globally throughout the entire string.   (So the problem is solved with one regular-expression applied one time to the “slurped” content of the file, with no looping or hoary string-manipulation.   Your friendly neighborhood Swiss Army Knife™ at your service...)

    As previously noted, this is of course a problem that has been solved before, and whose solution is well-documented.   Find, fetch, and examine that source-code snippet for a concrete example of the application of this idea in practice.