1: #!perl
   2: use strict;
   3: 
   4: unless(scalar(@ARGV)==2){
   5: 	print "\n$0 strips comment lines beginning with # from perl code";
   6: 	print "\n - lines that begin with #! aren't stripped\n - use with caution, may strip more than comments!\n";
   7: 	print "\nusage: perl unc.pl infile outfile";
   8: 	print "\n       set outfile to \"display\" to just show result\n";
   9: 	exit
  10: 	}
  11: my$infile  = shift;
  12: my$outfile = shift;
  13: 
  14: open(IN,"< $infile") or die "\n $infile not found \n";
  15: my@in = <IN>;
  16: close(IN) or die "$!";
  17: 
  18: my%code = ();
  19: my($d,$c) = 0;
  20: my$t=scalar(@in);
  21: for(@in){ 
  22: 	unless($_=~/^(\s+)?#[^!]/){
  23: 		$code{$c}=$_;
  24: 		$c++
  25: 		}
  26: 	else{$d++}
  27: 	}
  28: my@out = sort {$a <=> $b} keys %code;
  29: 
  30: if($outfile eq 'display'){
  31: 	for(@out){
  32: 		print $code{$_}
  33: 		}
  34: 	&sum();
  35: 	exit
  36: 	}
  37: open(OUT,"> $outfile") or die "\n $outfile write error \n";
  38: for(@out){
  39: 	print OUT $code{$_}
  40: 	}
  41: close(OUT) or die "$!";
  42: &sum();
  43: exit;
  44: 
  45: sub sum{
  46: print qq~\n $t lines read from $infile\n~;
  47: print qq~ $d comment lines detected in $infile\n~;
  48: print qq~ $c lines written to $outfile\n~;
  49: }

Replies are listed 'Best First'.
Re: strip perl comment lines
by repson (Chaplain) on Feb 07, 2001 at 17:06 UTC
    I didn't like like a number of points of style in your code, so I fiddled with it. There are more changes than I feel like listing, but you can tell what they are yourself. So I'll just show my changed (not necessarily improved) version and you can decide if it's an (?:dis)?improvement.
    #!perl -w use strict; unless (@ARGV==2) { die <<EOF; $0 strips comment lines beginning with # from perl code usage: $0 infile outfile set outfile to \"display\" to just show result EOF } my $infile = shift; my $outfile = shift; my $full_code = ''; my $bad_lines = 0; my $good_lines = 0; open(IN,"< $infile") or die "Can't open $infile: $!\n"; while (<IN>) { if ( /^\s*#[^!]/ ) { # if comment line $bad_lines++; } else { $good_lines++; $full_code .= $_; # add to code } } close(IN) or die "Can't close $infile: $!\n"; if ($outfile eq 'display') { print $full_code; } else { open(OUT,"> $outfile") or die "Can't write to $outfile: $!\n"; print OUT $full_code; close(OUT) or die "Can't close $outfile: $!"; } print ($good_lines+$bad_lines) . "lines read from $infile\n", "$bad_lines comment lines detected in $infile\n", "$good_lines lines written to $outfile\n";
      Thanks for the lesson repson. I see the wisdom of your changes:

      • replaced multiple print statements with here docs
      • eliminated useless scalar use
      • replaced the hash with string concatenation via assignment op
      • one while instead of for loops, sorting and subroutine
      • more elegant flow eliminates need for multiple exits

      But the final print statement:

      print ($good_lines+$bad_lines) . "lines read from $infile\n", "$bad_lines comment lines detected in $infile\n", "$good_lines lines written to $outfile\n";
      Failed on 5.00503 with the warning:
      print (...) interpreted as function at unc_rep.pl line 40.
      So /me sweeps up around the monastery:
      my$lines=($good_lines+$bad_lines); print "$lines lines read from $infile\n", "$bad_lines comment lines detected in $infile\n", "$good_lines lines written to $outfile\n";
      ps - fixed omission of the 2nd $0 in the original.
Re: strip perl comment lines
by danger (Priest) on Feb 07, 2001 at 21:02 UTC

    merlyn beat me with some of his comments (I'm only on my second cup of coffee) but I would add that if you think it unlikely that here-docs (or multiline quoted strings) will contain comments, I know I have such programs. Additionally, it is also conceivable that a # character could be used as a delimiter for one of the quoting or regex operators:

    $string =~ m# (some pattern) #x;

    If I were to do this I would take merlyn's suggestion of using '-' (but also allow the second argument to be optional), open the output handle up front (using $! in the error message) and take care of output right in the while loop so we don't need to build up the output in memory -- something along the lines of:

    #!/usr/bin/perl -w use strict; die <<USAGE unless @ARGV and @ARGV <= 2; $0 strips comment lines beginning with # from perl code usage: perl $0 infile [outfile] (output to stdout if no outfile given) USAGE my $infile = shift; my $outfile = shift || '-'; open(IN,"< $infile") or die "Couldn't open $infile: $!"; open(OUT, ">$outfile") or die "Couldn't open $outfile: $!"; my ($code, $comments) = (0,0); while(<IN>) { $comments++ and next if /^\s*#[^!]/; print OUT; $code++ } close IN; close OUT; my $total = $code + $comments; print<<SUMMARY; $total lines read from $infile $comments comment lines detected in $infile $code lines written to $outfile SUMMARY

    But, in reality, I wouldn't really do this because it is destined to fail on some Perl code for reasons already given, and we haven't even mentioned accidentally stripping things that look like comments in POD sections.

      Meanwhile, back at the ranch, there's a one-liner to be built around:
      use Regexp::Common;
      and
      s/$RE{comment}{Perl}//;
      if you want one.. Another Conway special.

        On the off chance that you aren't merely kidding around and haven't looked at the module in question, I feel compelled to point out a couple of things. Not only does Regexp::Common's Perl comment matcher (which is essentially this re: /#[^\n]*\n/) suffer from the various problems listed previously, but your example use also doesn't come close to what epoptai was originally trying to do (which was to just strip lines beginning with optional whitespace and a # character, with the exception of the shebang line). Your example of:

        s/$RE{comment}{Perl}//;

        would delete *any* # character (comment or not) to the end of a line (including the newline character) and turn the following code:

        #!/usr/bin/perl -w use strict; $_ = "blah # blah"; s#blah #boog#gx; print;

        into:

        use strict; $_ = "blah s print;

        Which is useful only useful insofar as it demonstrates the problems inherent with simple attempts to strip Perl comments.

Re: strip perl comment lines
by merlyn (Sage) on Feb 07, 2001 at 19:55 UTC
    Besides the other comments already in this thread, let me add a few of my own:
    • If you used the name - instead of display, you wouldn't need to special case anything in the code. It'd just work. Remember, when possible, stay with normal conventions, and then the programming support comes cheap or free.
    • You might be stripping more than just comments. You'll also be killing any line that starts with # inside a here-doc. Solving this problem is impossible for the general case, however. So document it as a limitation.
    • You have an undocumented feature of not stripping any lines that begin with #!. Either update the code or the docs.

    -- Randal L. Schwartz, Perl hacker

      Thanks merlyn. I knew that my strategy would prevent saving a file named 'display', but I wasn't aware of the superior '-' convention. I don't have much experience coding console apps so bear with me. To preserve the context of the replies i won't update the logic of my original code, but did include the warnings you suggested.

      Generally, i'm not suggesting that stripping perl comments is a good idea, because it eliminates very valuable documentation and optional lines of code, and is a highly dubious coding situation (parsing perl). However i find it a useful hack when trying to comprehend heavily commented cargo. In some cases it can reduce the script size by 50% and bring the flow of logic into sharper focus.

      To future readers of this thread, if you need a perl comment stripper i recommend upgrading to danger's more beautiful and efficient version below.

      It's not impossible if you base your comment-remover on the perltidy program. Not sure how you would do that, though.

      japhy -- Perl and Regex Hacker
        No, it's impossible. You can't parse an arbitrary Perl program statically and assign meaning to every token. I've demonstrated that here in my now famous "On Parsing Perl" note. You can get arbitrarily close but there will always be that gap.

        -- Randal L. Schwartz, Perl hacker