The following regex code to remove comments from a JavaScript chunk of code was developed with the help of several awesome Perl monks at this site...

The Code

#--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript code. # This includes regular expressions and quoted strings. #--------------------------------------------------------------------- +- my ($regexJSCode) = qr{ # First, we'll list things we want # to match, but not throw away (?: # Match a regular expression (they start with ( or =). # Then the have a slash, and end with a slash. # The first slash must not be followed by * and cannot contain # newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);" [\(=] \s* / (?: # char class contents \[ \^? ]? (?: [^]\\]+ | \\. )* ] | # escaped and regular chars (\/ and \.) (?: [^[\\\/]+ | \\. )* )* / (?: [gi]* # next characters are not word characters (?= [^\w] ) ) ) | # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" [^"'/]* )+ | # or single quoted constant (?: '[^'\\]* (?:\\.[^'\\]*)*' [^"'/]* )+ }x; #--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript comments and commen +t blocks. #--------------------------------------------------------------------- +- my ($regexJSComments) = qr{ # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: # traditional C comments (?: \* [^*]* \*+ (?: [^/*] [^*]* \*+ )* / ) | # or C++ //-style comments (?: / [^\n]* ) ) }x; #--------------------------------------------------------------------- +- # Get rid of all comments from the string. #--------------------------------------------------------------------- +- $strOutput =~ s{ ( $regexJSCode ) | $regexJSComments }{$1}gsx;

Input (Problems)

function test (str) { // A comment. alert ("test"); var reForwardSlash = /\//; var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) { return true; } }

Parsed Output (Incorrect)

The choking occurs on the regexForwardSlash variable:

function test (str) { alert ("test"); var reForwardSlash = /\ var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) + { return true; } }

If we get rid of the alert ("test") string, we will get the proper parsing... so the regex we have developed has some issues... Here's a successful parse with the same regex, just different input.

Input (No Problems)

function test (str) { // A comment. var reForwardSlash = /\//; var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) { return true; } }

Parsed Output (Correct)

This is the expected parse output.

function test (str) { var reForwardSlash = /\//; var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) + { return true; } }

Help Wanted

So as you can see, I'm doing something wrong in the regex... Does anyone see what the problem is? Any help is greatly appreciated.


In reply to Extracting C-Style Comments (Revisited Again) by Incognito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.