in reply to Extracting C Style Comments Revised (JavaScript)
You're probably not going to like this, but it's going to be more trouble than you're probably willing to go through to deal with this solely with regexps. Basically, unless you use fancy trickery like (?{code}) you can't express enough state in a regular expression to deal with arbitrary Javascript. This is the same reason that you can't handle arbitrary (X|HT|SG)ML solely with regexen. If I recalled more than vague snippets I could probably back this up with some mumbo-jumbo about LA(1) and LALR(1) grammars and the like.
To do things `right', you'd basically need to write a Javascript parser (or at least a tokenizer) that knows enough about Javascript syntax that it can keep enough state to tell the difference between quotes occurring inside JS regexen and those happening inside of double quoted strings.
|
|---|