Re: matching comments
by Eugene (Scribe) on Apr 22, 2000 at 00:33 UTC
|
The C style comment catching expression is "
/*([^*]|\*+[^/*])*\*+/ "
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: matching comments
by merlyn (Sage) on Apr 25, 2000 at 22:57 UTC
|
UNTESTED, but a lot of my stuff works without testing.... :-)
my $start = "/*";
my $end = "*/";
my $inside = 0;
my $oldpos = 0;
$_ = "your text /* goes here */ and here";
while (/(\Q$start\E|\Q$end\E)/g) {
if ($1 eq $start) {
if (++$inside == 1) {
$oldpos = pos($_) - length($start);
}
} else {
if (--$inside == 0) {
print substr($_, $oldpos, pos()-$oldpos);
}
}
}
| [reply] |
Re: matching comments
by perlmonkey (Hermit) on Apr 23, 2000 at 05:11 UTC
|
This is do the trick for arbitrary comment flags:
/\Q$start\E(.*?)\Q$end\E/sg
I wrote a small test program:
#!/usr/bin/perl
#get start and end comment
my $start = $ARGV[0];
my $end = $ARGV[1] || "\n";
open(FILE, "test.txt") || die;
{
local $/ = undef; #set to 'slurp' mode
$_ = <FILE>; #read entire file into $_
}
close FILE;
#
#print all comments that are matched in file
#
while( /\Q$start\E(.*?)\Q$end\E/sg )
{
print $&, "\n";
}
For a test.txt I used:
blah blah blah blah blah
*** multi
line
comment **!!
blah blah blah blah blah blah blah blah blah blah
blah *** inline comment **!! blah blah
/* c comment 1 */
blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah
/*
* c comment 2
*/
blah blah // c++ comment 1
blah blah // c++ comment 2
For my execution results I got (my exe is called regex.pl):
prompt$ regex.pl '***' '**!!'
*** multi
line
comment **!!
*** inline comment **!!
prompt$ regex.pl '/*' '*/'
/* c comment 1 */
/*
* c comment 2
*/
prompt$ regex.pl '//'
// c++ comment 1
// c++ comment 2
| [reply] [d/l] [select] |
|
|
The problem with this--and I don't know if it's actually
going to be a problem for the OP, but in general, it
might be--is that this will catch comments inside quoted
strings. For example:
char * comptr = "Comment: /* In comment. */";
Your regular expression will match this, but it isn't
actually a comment.
Again, this may not be an issue for the OP, but if it is,
you should take a look at the the faq
How do I use a regular expression to strip C style comments from a file?; perhaps you can extend this to your
uses. | [reply] [d/l] |
Re: matching comments
by perlmonkey (Hermit) on Apr 24, 2000 at 10:55 UTC
|
Now I think I should say that I am not aware of any compiler that will compile code with nested comments, so this is probably not a big problem.
However I played around and this seems to do the trick: (just replace the while loop in my code above
with this one.)
while( $file =~ /\Q$start\E(.*?)\Q$end\E/sg )
{
$a = $1;
$match = $&;
#look for more start tags in what we matched
while( $a =~ /\Q$start\E/sg )
{
#balance the ending comments
$file =~ /.*?\Q$end\E/sg;
$match .= $&;
}
print $match, "\n";
}
For your tests file I got what you wanted.
For other tests I used this test.txt:
blah blah
/* comment 1 */
blah blah
/* comment 2 */
blah blah
/* outer
/*
mid
/*
center
*/
mid
*/
outer
*/
And here are my results:
prompt$ regex.pl '/*' '*/'
/* comment 1 */
/* comment 2 */
/* outer
/*
mid
/*
center
*/
mid
*/
outer
*/
So enjoy this fanciful result.
I hope this helps.
| [reply] [d/l] [select] |
|
|
UNTESTED, but a lot of my stuff works without testing.... :-)
my $start = "/*";
my $end = "*/";
my $inside = 0;
my $oldpos = 0;
$_ = "your text /* goes here */ and here";
while (/(\Q$start\E|\Q$end\E)/g) {
if ($1 eq $start) {
if (++$inside == 1) {
$oldpos = pos($_) - length($start);
}
} else {
if (--$inside == 0) {
print substr($_, $oldpos, pos()-$oldpos);
}
}
}
| [reply] [d/l] |
|
|
That fails on
/* outer /* mid */ /* mid */ outer */
Try:
($re = $_)=~s/((\Q$start\E)|(\Q$end\E)|.)/${['(','']}[!$2]\Q$1\E${[')'
+,'']}[!$3]/gs;
$re = join'|',map quotemeta,eval{/$re/};
warn $@ if $@ =~ /unmatched/;
print join"\n",/($re)/g,"";
| [reply] [d/l] |
Re: matching comments
by perlmonkey (Hermit) on Apr 25, 2000 at 23:24 UTC
|
I just test the code above, and it does indeed work.
Out of curiosity I benchmarked the two solutions, and
merlyn's is twice as fast! (nice).
However, I would consider mine easier to follow, but maybe that is just
because I wrote it. I think I will have to look into using subsrting and pos for
performance issues though.
Here is my test code:
#!/usr/bin/perl
use Benchmark;
#get start and end comment
my $start = $ARGV[0];
my $end = $ARGV[1] || "\n";
my $file;
open(FILE, "test.txt") || die;
{
local $/ = undef; #set to 'slurp' mode
$file = <FILE>; #read entire file into $_
}
close FILE;
timethese(100000, {
'parse1' => sub { &parse1($file) },
'parse2' => sub { &parse2($file) },
});
sub parse1
{ my $file = shift;
while( $file =~ /\Q$start\E(.*?)\Q$end\E/sg )
{
$a = $1;
$match = $&;
#look for more start tags in what we matched
while( $a =~ /\Q$start\E/sg )
{
#balance the ending comments
$file =~ /.*?\Q$end\E/sg;
$match .= $&;
}
#print $match, "\n";
}
return $match;
}
sub parse2
{
my $file = shift;
my $inside = 0;
my $oldpos = 0;
while ($file =~ /(\Q$start\E|\Q$end\E)/g) {
if ($1 eq $start) {
if (++$inside == 1) {
$oldpos = pos($file) - length($start);
}
} else {
if (--$inside == 0) {
return substr($file, $oldpos, pos($file)-$oldpos);
}
}
}
}
And here is my results (I used the same test.txt as my post above):
prompt% parse.pl '/*' '*/'
Benchmark: timing 100000 iterations of parse1, parse2...
parse1: 42 wallclock secs (30.07 usr + 0.11 sys = 30.18 CPU)
parse2: 17 wallclock secs (13.60 usr + 0.04 sys = 13.64 CPU)
| [reply] [d/l] [select] |
Re: matching comments
by Eugene (Scribe) on Apr 28, 2000 at 21:49 UTC
|
Here is another issue, merlyn's program does not catch the escaped comment. Like
<CODE>Some text /*comment \*/ more comment*/<CODE>.
In fact it completely weeds the escape character out.
Any ways around it?
| [reply] |
|
|
Quite true. Look at perlre for the comments
on the "?<!" operator (zero-width negative lookbehind assertion operator).
This will fix merlyn's code (where $file is the text you are parsing):
my $file = shift;
my $inside = 0;
my $oldpos = 0;
while ($file =~ /(?<!\\)(\Q$start\E|\Q$end\E)/g) {
if ($1 eq $start) {
if (++$inside == 1) {
$oldpos = pos($file) - length($start);
}
} else {
if (--$inside == 0) {
print substr($file, $oldpos, pos($file)-$oldpos);
}
}
}
the (?<!\\)\Q$start\E will match what is in $start but
not preceeded by a '\' character. | [reply] [d/l] |
Re: matching comments
by Eugene (Scribe) on Apr 24, 2000 at 07:53 UTC
|
I am not worrying about comments inside the
quotes,but what about nested comments?
For my test.txt I used :
blah blah blah blah blah
/*
multi /* bla */
line
comment
*/
and the result was :
/*
multi /* bla */
Any idea on how to handle those so the result would be like
/*
multi /* bla */
line
comment
*/
Thanks, Eugene | [reply] |
|
|
| [reply] |
Re: matching comments
by Eugene (Scribe) on Apr 25, 2000 at 01:20 UTC
|
thanks for your time. This is what I needed.
Eugene | [reply] |