Incognito has asked for the wisdom of the Perl Monks concerning the following question:
The following regex code to remove comments from a JavaScript chunk of code was developed with the help of several awesome Perl monks at this site...
#--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript code. # This includes regular expressions and quoted strings. #--------------------------------------------------------------------- +- my ($regexJSCode) = qr{ # First, we'll list things we want # to match, but not throw away (?: # Match a regular expression (they start with ( or =). # Then the have a slash, and end with a slash. # The first slash must not be followed by * and cannot contain # newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);" [\(=] \s* / (?: # char class contents \[ \^? ]? (?: [^]\\]+ | \\. )* ] | # escaped and regular chars (\/ and \.) (?: [^[\\\/]+ | \\. )* )* / (?: [gi]* # next characters are not word characters (?= [^\w] ) ) ) | # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" [^"'/]* )+ | # or single quoted constant (?: '[^'\\]* (?:\\.[^'\\]*)*' [^"'/]* )+ }x; #--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript comments and commen +t blocks. #--------------------------------------------------------------------- +- my ($regexJSComments) = qr{ # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: # traditional C comments (?: \* [^*]* \*+ (?: [^/*] [^*]* \*+ )* / ) | # or C++ //-style comments (?: / [^\n]* ) ) }x; #--------------------------------------------------------------------- +- # Get rid of all comments from the string. #--------------------------------------------------------------------- +- $strOutput =~ s{ ( $regexJSCode ) | $regexJSComments }{$1}gsx;
function test (str) { // A comment. alert ("test"); var reForwardSlash = /\//; var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) { return true; } }
The choking occurs on the regexForwardSlash variable:
function test (str) { alert ("test"); var reForwardSlash = /\ var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) + { return true; } }
If we get rid of the alert ("test") string, we will get the proper parsing... so the regex we have developed has some issues... Here's a successful parse with the same regex, just different input.
function test (str) { // A comment. var reForwardSlash = /\//; var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) { return true; } }
This is the expected parse output.
function test (str) { var reForwardSlash = /\//; var reBackslash = /\\/; if (str.match(regexForwardslash) && str.match(regexBackslash)) + { return true; } }
So as you can see, I'm doing something wrong in the regex... Does anyone see what the problem is? Any help is greatly appreciated.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting C-Style Comments (Revisited Again)
by chipmunk (Parson) on Mar 06, 2002 at 02:42 UTC | |
by Incognito (Pilgrim) on Mar 06, 2002 at 03:56 UTC |