Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to remove Perl comments from a file, without messing up regular expressions within the program that use # in them?

Basically, I need to strip everything after a # on each line, unless the # does not start a comment.

Thanks

Replies are listed 'Best First'.
Re (tilly) 1: Removing Comments
by tilly (Archbishop) on Jan 12, 2001 at 23:57 UTC
    Try use B::Deparse (perldoc B::Deparse to find out how it works) to compile and decompile the Perl in a form that doesn't have comments any more. The output source-code will not match the input but should be equivalent.
Re: Removing Comments
by Adam (Vicar) on Jan 12, 2001 at 23:55 UTC
    They say that the only thing that can parse Perl is the Perl parser itself. Why? Well, you can use # inside quoted strings and regular expressions, but you can also redefine the wrappers around these things with q(), qq(), m(), s()(), tr()() and so on. My suggestion is to try the deparser:
    perl -MO=Deparse file.pl
Re: Removing Comments
by dws (Chancellor) on Jan 13, 2001 at 00:32 UTC
    You'll discover this soon enough, but you will probably have to leave at least one # line alone: #!/usr/bin/perl -w
Re: Removing Comments
by mp3car-2001 (Scribe) on Jan 13, 2001 at 04:39 UTC
    I'm not gonna write regex here, but I'll give my .02. Create a script that opens a file, runs some regex's over it and spit it out to a different file (in case we break something)

    If it starts with #!, leave it alone. If the line starts with a #, its a comment so delete it. If you find a # elsewhere, and is preceded by a ; (possibly with whitespace in between), hack off the end of the line. That should take care of 99% of your problems.

    I'd write it up, but I'm not at home now and don't have access to perl to test, so I won't confuse with bad code. Maybe I'll drop back in and write it tomorrow though....

      If it starts with #!, leave it alone. If the line starts with a #, its a comment so delete it. If you find a # elsewhere, and is preceded by a ; (possibly with whitespace in between), hack off the end of the line. That should take care of 99% of your problems.
      This doesn't even begin to solve the problem. That won't handle this very legal example:
      #!perl -w use strict; # Always! # Call method foo like this: # my @results = foo( $arg1, \%hash ); # foo in array context sub foo { my ( $arg, $hashref ) @_; $arg =~ m/some regex # with true embedded comments on multiple lines/x; $arg =~ m/some regex with a # sign in it./; $arg =~ m/some regex with a ; # combo in it./; my $result1 = "a string with; # in it"; my $result2 = q; # nasty!;; # this comment has ' ' as the first char. return ( $result1, $result2 ) # no semi-colon! }
      And it gets worse, much worse. I didn't even mention the backslash escape problems.
      I now have a similar problem in front of me - to recognize comments within perl scripts. I'm reading my own Perl code, and only a small fraction of my comments (I have many comments, I make sure my code is readable) are preceded by a ';'