in reply to Re: Extracting C Style Comments Revised (JavaScript)
in thread Extracting C Style Comments Revised (JavaScript)

Incorporating what you've added as input:

New Code

$data =~ s{ # First, we'll list things we want # to match, but not throw away ( (?:/[^\r\n\*\/]+/) # Match RegExp | # -or- [^"'/]+ # other stuff | # -or- (?:"[^"\\]*(?:\\.[^"\\]*)*" [^"'/]*)+ # double quoted string | # -or- (?:'[^'\\]*(?:\\.[^'\\]*)*' [^"'/]*)+ # single quoted constant ) | # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: \*[^*]*\*+(?:[^/*][^*]*\*+)*/ # traditional C comments | # -or- /[^\n]* # C++ //-style comments ) }{$1}gsx;

Updated

This code does work for the above examples, but does not work for regular expressions with containing a '*', for example.
var b=/\s*;\s*/gi;
There should be a way for us to do this, because we want to handle that 99% of code that is out there... without writing a parser...

I'm thinking we need to modify the regex in the "# Match RegExp" section further, to ignore *s and \/s... this may not be easy, and if I figure it out, I'll post it here.

Replies are listed 'Best First'.
Re: Re: Re: Extracting C Style Comments Revised (JavaScript)
by Tetramin (Sexton) on Oct 24, 2001 at 01:32 UTC
    Try (?:/[^\r\n\*\/][^\r\n\/]*/)

    Still doesn't work with divisions like abc/100 because it now thinks it's the beginning of a regular expression.

      Possible Hack

      I think one way to do this may be to make the assumption that all JavaScript regular expressions follow after an equal "=" sign or a left-parenthesis "(".

      $data =~ s{ # First, we'll list things we want # to match, but not throw away ( (?: # Match RegExp [\(=]\s* # start with ( or = / [^\r\n\*\/][^\r\n\/]* / # All RegExps start and end # with slash, but first one # must not be followed by * # and cannot contain newline # chars # # var re = /\*/; # a = b.match (/x/); ) | # -or- [^"'/]+ # other stuff | # -or- (?:"[^"\\]*(?:\\.[^"\\]*)*" [^"'/]*)+ # double quoted string | # -or- (?:'[^'\\]*(?:\\.[^'\\]*)*' [^"'/]*)+ # single quoted constant ) | # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: \*[^*]*\*+(?:[^/*][^*]*\*+)*/ # traditional C comments | # -or- /[^\n]* # C++ //-style comments ) }{$1}gsx;

      Does anyone know how to improve on this or how to make it fail?