in reply to Regex Strikes again!

Assuming you are looking for all comments in some nasty(large) java programs, first what are all the possible ways a comment can be mal-formed? And are they critical to your life goal?

That said here is a sample input file with a few interesting possibilities (yes it compiles):

/** * The HelloWorldApp Class impliments an application that displays "H +ello World!" to standard output */ public class HelloWorldApp { public static void main (String[] args) { // Display "Hello World!" System.out.println("Hello World!"); // end of line comment System.out.println("Hello World!"); /* end of line comment */ System.out.println("Hello World!/*"); /* end of line comment */ /* this is a multi line comment */ /* another comment */ // /* is this a valid comment /* // */ System.out.println("//Hello World!/*"); // is /* this valid } }

Below is some rough code to start grabbing comments. Coding for the cute exceptions is left as an exercise for the student (I hava always wanted to use that line, since it was used on me a long time ago) or as future discussion points. It prints the line number of the file where the comment begins and then the comment.

#!/usr/bin/perl use strict; my $infile = "HelloWorldApp.java"; my %linehash; my $linecounter =0; my $comment_started = 0; my $multiline_comment; open in, "$infile" or die "could not open $infile\n"; while (<in>) { ## you may need to get creative in matching comments ## because java allows some fun combinations - see HelloWorld ## ## grab single line comments first else look for multiline if (/\/\/.*\n/ || /\/\*.*\*\//) { $linehash{$linecounter} = $_; } else { ## possible multiline comment start if ($comment_started) { $multiline_comment = $multiline_comment . $_; ## don't mess with $_ as later comparisons may need the newline in pla +ce chomp $multiline_comment; $multiline_comment = $multiline_comment . " "; ## end of multiline comment if (/\*\//) { $linehash{$comment_started} = $multiline_comment; $comment_started = 0; $multiline_comment = ""; } } ## start multiline comment if (/\/\*/) { $comment_started=$linecounter; } } $linecounter++; } my @keys = sort{$a <=> $b}(keys(%linehash)); for (@keys) { print "key=$_ value=$linehash{$_}"; }

Enjoy
John

Replies are listed 'Best First'.
Re: Re: Regex Strikes again!
by nofernandes (Beadle) on Jul 16, 2003 at 14:53 UTC

    Your code is very good.. thank you.. and if i had it a few weaks earlier it migtht have been a very good option for my program

    But the other regex works just fine and catches all the cases! And is simply "changable" in order to serve to other languages as PLSQL, ProC, etc..

    The only problem that i have is how to catch the number of lines!!

    Iīve made some code in order to compare two files and it works "almost"* fine but in larger files it might be a bit slow!!

    I said that almost works fine because it "flips over" when he finds lines like this /*********************/ !!! And i donīt have any idea why!!!

    Maybe you can help me out to figurate out what is the problem!!!

    foreach my $line (@fich){ $i++; $flag=0; foreach $comm (@com){ if( (($line eq $comm) || (index($line, $comm) > -1)) && ($flag +==0)){ print"Linha $i: $comm"; $flag=1; print "Flag1: $flag\n"; } } }

    The two arrays contain two files!!The @com contains the file with the extracted comments and the @fich contains the content of the source code!!

    Thank you very much!!

    Nuno
      This will give you the correct line number count. counting the newlines in the matched set isn't what you want, unless I misunderstood your code. - slurped file into variable $f and changed reference of $slurpedfile to the original input file.

      hth
      John

      #!/usr/bin/perl $file="theinputfile"; undef $/; #In order to read the whole file at once open(F,"$file"); $f = <F>; my @matches = $f =~ m{ ( /\* .*? \*/) | ( \/\/[^\n]*) | " (?: [^"\\]* | \\. )* " | ' (?: [^'\\]* | \\. )* ' | . [^/"']* }xgs; @matches = grep {defined $_} @matches; #get rid of undefs my $linenum = 1; foreach my $match (@matches) { # $slurpedfile =~ /\Q$match/; $f =~ /\Q$match/; my $before = $`; # $slurpedfile = $'; $f = $'; my $matched = $&; $linenum += $before =~ tr/\n/\n/; print "Line $linenum\t$match\n"; $linenum += $match =~ tr/\n/\n/; }