Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I read a file into an array using @file=<FILE>.
Problem is I have to parse the data in the file excluding C-Style comments.
They may be only one line long:
/* this is a comment /*
or multiple lines:
/*
these 3 lines are comments
*/
I'm currently using the following code to process the file:

open (FILE,"file"); @file=<FILE>; close(FILE); for ($i=0; $i<=$#file; $i++) { # Here I'm trying to catch the comments and ignore them if ($file[$i]=~/\/\*/) { # If comment is found ... while ($file[$i]!~/\*\//) { $i++; } # Check lines till finding end o +f comment next; } ... processing data ... }

My problem is that /\/\*/ doesn't match "/*". (without quotes)
I have also tried /\/\042/ which doesn't match, too.
Does anyone have an idea to solve this problem?
Thanks in advance.

Replies are listed 'Best First'.
Re: Matching C-Style comments
by mirod (Canon) on Jun 19, 2001 at 19:46 UTC

    Instead of storing the text in an array just store it in a string and then you can use a regular regexp:

    #!/bin/perl -w use strict; undef $/; # so <DATA> will slurp the entire file my $string=<DATA>; $string=~ s{/\*.*?\*/}{}gsx; # do not forget the s modifier so . matc +hes \n too print $string; __DATA__ A text /* with comments */, some even /* cross several lines */, some even /* cross many, many, many */ lines

    Of course you can also "unroll the loop" as Brovnik mentionned

Re: Matching C-Style comments
by Brovnik (Hermit) on Jun 19, 2001 at 19:39 UTC
(tye)Re: Matching C-Style comments
by tye (Sage) on Jun 20, 2001 at 02:21 UTC

    Note that if, for example, you are processing C source code, then all of the methods presented so far can make mistakes. For example, using them on this:

    int fun( char *string ) { // Check string/*string for validity: if( NULL != string && '\0' != *string ) { /* Okay! */ static char *punct= "!@#-+/*.,;"; /* Strip some punction characters: */ stripchars( string, punct ); ....
    will strip some important stuff that isn't comments! To deal with such problems gets pretty close to writing a parser so you might want to consider Parse::RecDescent (which may even come with an example grammar that parses C).

    Alternately, you can write your own parser that understands just a few things like C strings, C++-style comments, and, of course, C-style comments and work your way along. Sorry, I'm not going to try to throw a working version of that together right here, nor am I going to track one down for you (though I expect this has been written more than once so you can probably find one if you are good at searching).

    I just didn't want you to get caught by surprise on this.

            - tye (but my friends call me "Tye")
Re: Matching C-Style comments
by stefan k (Curate) on Jun 19, 2001 at 19:42 UTC
    Hi,
    If I'm not completely wrong it should work using
    if ($file[$i] =~ m(/\*)) ...
    This takes the matching operator m, uses parens as delimiter to the regexp
    and only has to escape the *

    Actually I tried it on the command line using

    $ perl -p -e 'print "MATCH" if m(/\*)' foo foo /* MATCH/*
    which seems to me like it works. Correct me if I'm getting things wrong. BTW: is there a better command line for this? I don't understand why the first line (which is what I type in) is echoed once

    Regards... Stefan

      Well, your regex isn't enough, because while it matches /*, it doesn't check whether that occurs inside a string or not - and when it's inside a string, it's not the start of the comment. I won't repeat the regex here, you can find it in the hip owls book, and there are pointers to it already posted.

      As for your second question, it's not just the first line that's echoed, it's the second line too, there's a /* after the MATCH! The reason is your use of -p, which means that $_ is printed for each line of input processed. Regardless of any print statements in the program. You probably want to use -n instead.

      -- Abigail

Re: Matching C-Style comments
by scain (Curate) on Jun 19, 2001 at 19:42 UTC
    I haven't even tried out your code, but it doesn't seem that you've considered all the possibilities for comments. For instance, have you considered this case:

    some code here/* comments */ more code here .

    Scott