Removing '//' comments

Replies are listed 'Best First'.
Re: Removing '//' comments (tokenize) by tye (Sage) on Jul 06, 2006 at 05:36 UTC
First, try not to use a delimiter for your regex that causes you to have to escape a lot more characters. More importantly, your regex has several mistakes. Trying to match "anything except this multi-character sequence" is almost always done wrong, even by top experts (at least a few times), so such isn't a big surprise. (: For example, `[^\/][^\/]?` is really just the same as `[^/]+?`, and so avoids matching single slashes rather than avoiding double slashes, as the construct hints at. So `(?:(?:[^\/][^\/]?\|)?` boils down to `([^/]?)?`, which is just an inefficient way of writing `[^/]?`. `("\|').?\2` will stop matching too early for `"This \"string\" with quotes"` but can also "backtrack" and match too much. You really want to force this construct to only match exactly quoted strings. So, `'([^'\\]+\|\\.)'\|"([^"\\]+\|\\.)"` instead. Your "stuff I don't care about" needs to avoid matching quotes or slashes so that you don't just skip over a starting quote as "something I don't care about". So your regex needs something like `([^'"/]+\|$quotes\|...)`. And a tricky part is the "skip over / but not over //". Something like `(?<!/)/(?!/)`. Which brings us to this: `$text =~ s< (^ (?: [^/'"]+ \| '([^'\\]+\|\\.)' \| "([^"\\]+\|\\.)" \| (?<!/)/(?!/) )* ) //.* ><$1>xgm;` [download] Which likely has several bugs. Note that I didn't allow for \ to cause the comment to continue on to subsequent lines because I both believe and hope that such doesn't actually work in the languages that I use //-comments in. Note that the pseudo tokenizer needs to match any constructs that could contain quotes or slashes so, for example, /* ... */ would need to be handled if such might be encountered. - tye	[reply] [d/l] [select]
Re: Removing '//' comments by GrandFather (Saint) on Jul 06, 2006 at 04:51 UTC
Some interesting cases: `use warnings; use strict; while (<DATA>) { s/( (?:(?:[^\/][^\/]?\|)? # anything, except comment sym +bol ("\|').?\2 # quoted string (and take ever +ything) (?:[^\/][^\/]?\|)? # anything, except comment sym +bol )? ) (?: \/\/ # comment symbol [^\\]* # anything, except continuatio +n )??$ /$1/x; print; } __DATA__ /* // / / / str = "//\"//";` [download] Prints: `/* / / / str = "//\"` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Removing '//' comments by davidrw (Prior) on Jul 06, 2006 at 06:12 UTC
my first thought here was Regexp::Common::Comment ... Looking in its source, it basically does `s#//[^\n]$##s;` It's interesting to note that it behaves in the same way as the regexp GrandFather demo'd above, especially with `str = "//\"//";` tye's example (though i had to make it `s#foo#bar#xgm` instead of `s<foo><bar>xgm` before it would compile) does work with GrandFather's test cases. `use warnings; use Regexp::Common qw /comment/; while(<DATA>){ my ($line, $simple, $RE, $tye) = ($_)x4; $simple =~ s#//[^\n]$##s; $RE =~ s#$RE{comment}{Portia}#\n#; $tye =~ s# (^ (?: [^/'"]+ \| '([^'\\]+\|\\.)' \| "([^"\\]+\|\\.)" \| (?<!/)/(?!/) ) ) //.* #$1#xgm; print " [DATA] $line"; print "[simple] $simple"; print " [RE] $RE"; print " [tye] $tye"; print "\n"; } __DATA__ blah // comment /* // / / / str = "//\"//";` [download]	[reply] [d/l] [select]
Re: Removing '//' comments by Zaxo (Archbishop) on Jul 06, 2006 at 04:48 UTC
~~Fails if '//' occurs in quoted text,~~ `int foo = sprintf( line, "Comparing, //-comments in C/C++ act like #-comments in %s. ", "perl");` [download] Update: oops, misread, see GrandFather's examples instead. After Compline, Zaxo	[reply] [d/l]