Assuming you are looking for all comments in some nasty(large) java programs, first what are all the possible ways a comment can be mal-formed? And are they critical to your life goal?

That said here is a sample input file with a few interesting possibilities (yes it compiles):

/** * The HelloWorldApp Class impliments an application that displays "H +ello World!" to standard output */ public class HelloWorldApp { public static void main (String[] args) { // Display "Hello World!" System.out.println("Hello World!"); // end of line comment System.out.println("Hello World!"); /* end of line comment */ System.out.println("Hello World!/*"); /* end of line comment */ /* this is a multi line comment */ /* another comment */ // /* is this a valid comment /* // */ System.out.println("//Hello World!/*"); // is /* this valid } }

Below is some rough code to start grabbing comments. Coding for the cute exceptions is left as an exercise for the student (I hava always wanted to use that line, since it was used on me a long time ago) or as future discussion points. It prints the line number of the file where the comment begins and then the comment.

#!/usr/bin/perl use strict; my $infile = "HelloWorldApp.java"; my %linehash; my $linecounter =0; my $comment_started = 0; my $multiline_comment; open in, "$infile" or die "could not open $infile\n"; while (<in>) { ## you may need to get creative in matching comments ## because java allows some fun combinations - see HelloWorld ## ## grab single line comments first else look for multiline if (/\/\/.*\n/ || /\/\*.*\*\//) { $linehash{$linecounter} = $_; } else { ## possible multiline comment start if ($comment_started) { $multiline_comment = $multiline_comment . $_; ## don't mess with $_ as later comparisons may need the newline in pla +ce chomp $multiline_comment; $multiline_comment = $multiline_comment . " "; ## end of multiline comment if (/\*\//) { $linehash{$comment_started} = $multiline_comment; $comment_started = 0; $multiline_comment = ""; } } ## start multiline comment if (/\/\*/) { $comment_started=$linecounter; } } $linecounter++; } my @keys = sort{$a <=> $b}(keys(%linehash)); for (@keys) { print "key=$_ value=$linehash{$_}"; }

Enjoy
John


In reply to Re: Regex Strikes again! by johndageek
in thread Regex Strikes again! by nofernandes

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.