Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am at a company that is using java script. The company is looking to obfuscate the code. So what did they as me to do? Obfuscate the code using perl. Ironic how I am using perl to fix java script. I am pretty close on completing my obfuscation code. I am having a problem with a regular expresion. Is there a regular expresion that would delete c syntax comments. I have already removed the single line "//" comments. I am having trouble with the multi line /* comment comment comment */ codes. If there isn't a regular expresion to handle such a beast any ideas on how to pull the (/* */)comments out. Thanks Brett

Replies are listed 'Best First'.
RE: Multi Line c comments
by japhy (Canon) on Nov 15, 2000 at 23:40 UTC
    This is shown in the Perl FAQ: "How do I use a regular expression to strip C style comments from a file?" in Perl FAQ section 6.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
(Ovid) RE: Multi Line c comments
by Ovid (Cardinal) on Nov 15, 2000 at 23:39 UTC
    $text =~ s|/\* # First slash and star /* (?: # Non-backreferencing parentheses (?!\*/) # not a star slash */ . # ok to inch along )* # Zero or more \*/||sx; # Followed by a star slash */
    This should work for you. See Death to Dot Star! for details. Be careful on this one. It's really tricky.

    With all due respect to kilinrax, his (her?) regex can fail under some circumstances:

    my $text = 'foo /* bar **/ baz /* ack! ph! */'; $text =~ s| \/\* # '/*', escaped [^\*]* # 0 or more non-'*' characters \*\/ # '*/', escaped ||x; print $text;
    This prints foo /* bar **/ baz. The extra asterisk at the end of the first C comment throws off the regex.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Ahh. This wasn't here when I started typing my reply. I think you said it better than me anyway :)
RE: Multi Line c comments
by kilinrax (Deacon) on Nov 15, 2000 at 23:28 UTC
    Update: This code, as pointed out below, will not work if the text in the comment contains any asterisks.

    See How do I use a regular expression to strip C style comments from a file? for a regexp that will work in all cases.


    I am pretty sure that this would work:
    $text =~ s|\/\*[^\*]*\*\/||;
    Breaking that down a bit (i can appreciate how it might just look like a bunch of asterisks and slashes, but i guess that's probably a good thing as far as your obfuscation goes ;-):
    $text =~ s| \/\* # '/*', escaped [^\*]* # 0 or more non-'*' characters \*\/ # '*/', escaped ||x;
      Hmm. Not sure about this. I seem to remember that getting rid of C comments was a pain - Death to Dot Star. Specifically Merlin's Post and Ovid's Responses to that thread.

      I know there is no dot-star in that regex, and it uses negative character classes but I am worried that it might fall into the same trap as the ones in the above nodes. I suppose, I could try it to see.

      e.g. is this a valid C comment:
      /* This my comment It spans a couple* of lines. * actually it spans 5 lines but it has 3 *s in it */

      It seems to me that the above regex won't work here. Maybe it's not valid C comment.

      Actually, I don't know what I'm talking about - I have been stuck at work today for too long - I just needed to node. I am going for a break.
      I am a little scared, considering that this is a FAQ, how many people are getting it wrong.

      This has a very common failure mode that looks like this:

      /***************************************** * Script Foo * * Written by "I like pretty comments" * * The purpose of this is... * * etc/and more... * ******************************************/
      This fairly generic example (and common variations) will defeat most simplistic attempts at this problem. :-)

      (See the FAQ for more.)

        You are correct, my regexp was naive and wrong; and not remembering something in the faq is pretty inexcusable :-(