strip perl comment lines

Replies are listed 'Best First'.
Re: strip perl comment lines by repson (Chaplain) on Feb 07, 2001 at 17:06 UTC
I didn't like like a number of points of style in your code, so I fiddled with it. There are more changes than I feel like listing, but you can tell what they are yourself. So I'll just show my changed (not necessarily improved) version and you can decide if it's an (?:dis)?improvement. #!perl -w use strict; unless (@ARGV==2) { die <<EOF; $0 strips comment lines beginning with # from perl code usage: $0 infile outfile set outfile to \"display\" to just show result EOF } my $infile = shift; my $outfile = shift; my $full_code = ''; my $bad_lines = 0; my $good_lines = 0; open(IN,"< $infile") or die "Can't open $infile: $!\n"; while (<IN>) { if ( /^\s*#[^!]/ ) { # if comment line $bad_lines++; } else { $good_lines++; $full_code .= $_; # add to code } } close(IN) or die "Can't close $infile: $!\n"; if ($outfile eq 'display') { print $full_code; } else { open(OUT,"> $outfile") or die "Can't write to $outfile: $!\n"; print OUT $full_code; close(OUT) or die "Can't close $outfile: $!"; } print ($good_lines+$bad_lines) . "lines read from $infile\n", "$bad_lines comment lines detected in $infile\n", "$good_lines lines written to $outfile\n"; [download]	[reply] [d/l]
Re: Re: strip perl comment lines by epoptai (Curate) on Feb 07, 2001 at 17:48 UTC
Thanks for the lesson repson. I see the wisdom of your changes: replaced multiple print statements with here docs eliminated useless scalar use replaced the hash with string concatenation via assignment op one `while` instead of `for` loops, sorting and subroutine more elegant flow eliminates need for multiple exits But the final print statement: `print ($good_lines+$bad_lines) . "lines read from $infile\n", "$bad_lines comment lines detected in $infile\n", "$good_lines lines written to $outfile\n";` [download] Failed on 5.00503 with the warning: `print (...) interpreted as function at unc_rep.pl line 40.` [download] So /me sweeps up around the monastery: `my$lines=($good_lines+$bad_lines); print "$lines lines read from $infile\n", "$bad_lines comment lines detected in $infile\n", "$good_lines lines written to $outfile\n";` [download] ps - fixed omission of the 2nd `$0` in the original.	[reply] [d/l] [select]
Re: strip perl comment lines by danger (Priest) on Feb 07, 2001 at 21:02 UTC
merlyn beat me with some of his comments (I'm only on my second cup of coffee) but I would add that if you think it unlikely that here-docs (or multiline quoted strings) will contain comments, I know I have such programs. Additionally, it is also conceivable that a `#` character could be used as a delimiter for one of the quoting or regex operators: `$string =~ m# (some pattern) #x;` [download] If I were to do this I would take merlyn's suggestion of using '-' (but also allow the second argument to be optional), open the output handle up front (using `$!` in the error message) and take care of output right in the while loop so we don't need to build up the output in memory -- something along the lines of: #!/usr/bin/perl -w use strict; die <<USAGE unless @ARGV and @ARGV <= 2; $0 strips comment lines beginning with # from perl code usage: perl $0 infile [outfile] (output to stdout if no outfile given) USAGE my $infile = shift; my $outfile = shift \|\| '-'; open(IN,"< $infile") or die "Couldn't open $infile: $!"; open(OUT, ">$outfile") or die "Couldn't open $outfile: $!"; my ($code, $comments) = (0,0); while(<IN>) { $comments++ and next if /^\s*#[^!]/; print OUT; $code++ } close IN; close OUT; my $total = $code + $comments; print<<SUMMARY; $total lines read from $infile $comments comment lines detected in $infile $code lines written to $outfile SUMMARY [download] But, in reality, I wouldn't really do this because it is destined to fail on some Perl code for reasons already given, and we haven't even mentioned accidentally stripping things that look like comments in POD sections.	[reply] [d/l] [select]
Re: Re: strip perl comment lines by quinkan (Monk) on Mar 05, 2001 at 16:14 UTC
Meanwhile, back at the ranch, there's a one-liner to be built around: `use Regexp::Common;` [download] and `s/$RE{comment}{Perl}//;` [download] if you want one.. Another Conway special.	[reply] [d/l] [select]
Re: Re: Re: strip perl comment lines by danger (Priest) on Mar 05, 2001 at 20:35 UTC
On the off chance that you aren't merely kidding around and haven't looked at the module in question, I feel compelled to point out a couple of things. Not only does Regexp::Common's Perl comment matcher (which is essentially this re: `/#[^\n]\n/`) suffer from the various problems listed previously, but your example use also doesn't come close to what epoptai was originally trying to do (which was to just strip lines beginning with optional whitespace and a # character, with the exception of the shebang line). Your example of: `s/$RE{comment}{Perl}//;` [download] would delete any* # character (comment or not) to the end of a line (including the newline character) and turn the following code: `#!/usr/bin/perl -w use strict; $_ = "blah # blah"; s#blah #boog#gx; print;` [download] into: `use strict; $_ = "blah s print;` [download] Which is useful only useful insofar as it demonstrates the problems inherent with simple attempts to strip Perl comments.	[reply] [d/l] [select]
Re: strip perl comment lines by merlyn (Sage) on Feb 07, 2001 at 19:55 UTC
Besides the other comments already in this thread, let me add a few of my own: If you used the name `-` instead of `display`, you wouldn't need to special case anything in the code. It'd just work. Remember, when possible, stay with normal conventions, and then the programming support comes cheap or free. You might be stripping more than just comments. You'll also be killing any line that starts with `#` inside a here-doc. Solving this problem is impossible for the general case, however. So document it as a limitation. You have an undocumented feature of not stripping any lines that begin with `#!`. Either update the code or the docs. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: strip perl comment lines by epoptai (Curate) on Feb 08, 2001 at 02:27 UTC
Thanks merlyn. I knew that my strategy would prevent saving a file named 'display', but I wasn't aware of the superior '-' convention. I don't have much experience coding console apps so bear with me. To preserve the context of the replies i won't update the logic of my original code, but did include the warnings you suggested. Generally, i'm not suggesting that stripping perl comments is a good idea, because it eliminates very valuable documentation and optional lines of code, and is a highly dubious coding situation (parsing perl). However i find it a useful hack when trying to comprehend heavily commented cargo. In some cases it can reduce the script size by 50% and bring the flow of logic into sharper focus. To future readers of this thread, if you need a perl comment stripper i recommend upgrading to danger's more beautiful and efficient version below.	[reply]
Re: Re: strip perl comment lines by japhy (Canon) on Feb 07, 2001 at 19:58 UTC
It's not impossible if you base your comment-remover on the `perltidy` program. Not sure how you would do that, though. `japhy` -- Perl and Regex Hacker	[reply]
Re: Re: Re: strip perl comment lines by merlyn (Sage) on Feb 07, 2001 at 20:03 UTC
No, it's impossible. You can't parse an arbitrary Perl program statically and assign meaning to every token. I've demonstrated that here in my now famous "On Parsing Perl" note. You can get arbitrarily close but there will always be that gap. -- Randal L. Schwartz, Perl hacker	[reply]