abhishes has asked for the wisdom of the Perl Monks concerning the following question:

Hello All,
I am writing a simple perl script to find out the number of
source code vs comments in my Java project.

I have the following statement to determine whether a line
is commented or not.
if (length($line) != 0) { if (/^\/\// or /^\/\*/ or /^\*/) { $comment = $comment + 1; } else { print "$line\n"; $code = $code + 1; } }

But it does not work because all the lines which are like
/** this is a comment */ are being included as source.

what is wrong with my regular expression ?

regards,
Abhishek.

Replies are listed 'Best First'.
Re: Regular experssion for starts with doesn't work
by sauoq (Abbot) on Sep 20, 2002 at 08:24 UTC

    You aren't exactly matching comments. It looks like you are matching lines that start with "//", "/*", or "*". Assuming that is good enough for your purposes, this might work as a quick and dirty fix:

    if ($line =~ m!^\s*(?://|/?\*)!)
    That allows leading space for some robustness, changes the delimiter for readability, combines the regular expressions, and matches against $line, which is presumably what you want.

    The line

    if (length($line) != 0)
    would probably be better expressed as simply if (length($line)) too.

    As diotalevi already mentioned, however, you should really be parsing (at least minimally) to find comments successfully in every case and without false positives. If you want to go that route and build a truly useful tool, it might help you to look into JPL1 (Java/Perl Lingo) source code. Maybe. I don't really know as I've pretty successfully avoided Java since 1.0 and therefore have never needed JPL.

    1It seems jens posted a question about JPL earlier. I must have seen his post in the newest nodes and been reminded of JPL's existence. :-)

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Regular experssion for starts with doesn't work
by kabel (Chaplain) on Sep 20, 2002 at 05:33 UTC
    a regular expression without the =~ operator tests the $_ scalar. but it seems that you intended $line to hold each lines of the java source code. if that is not true, forget my posting... ;)

    you can use (nearly) any other character as delimiters to regular expressions. that is sometimes useful if the slash is to be matched against, for example:
    if (m.^//.) ...
    should work, too.

      You'll also want to note that other constructs where comment like sequences can appear inside of strings or be commented out themselves. You've assumed all comment sequences begin at the start of the line. You could at least use optional leading whitespace.

      Java sequences you didn't consider

      String comment_a = "/* this is not a comment */'; String comment_b = "" + // /* "this is actually used" + // */ "";

      You'll probably want to go to java.sun.com for the actual grammar used for parsing Java. That is ... unless someone else has written a grammar already. The reference is at http://java.sun.com/docs/books/jls/second_edition/html/jTOC.doc.html and it doesn't look *too* involved...

      This is a slightly cleaned up version of what you originally posted with some fixes for legibility.

      if ( length $line ) { if ($line =~ m|^\s* // |x or $line =~ m|^\s* /\* |x or $line =~ m|^\s* \* |x) { $comment = $comment + 1; } else { print "$line\n"; $code = $code + 1; } }