Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Comment Stripper script for unix

by hsinclai (Deacon)
on Jun 14, 2004 at 01:55 UTC ( [id://366388]=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info devel @hastek.com
Description: e.pl
invoke as "e" or "ee"
Comment stripper for unix, useful during system administration. Removes blank lines, writes output file, strips "#" or ";". Tries to preserve shell scripts.
Please see the POD
#!/usr/bin/perl -w

#     e.pl   (invoke as e or ee)
#            Please see the POD for install and licensing details

use strict;

###### globals
my $version = "0.9";
my $comm;
my @stripped;
my $topline;


######  how we were called
chomp(my $us = qx!basename $0!);
if ( $us eq "ee" ) { $comm = ';'; } else { $comm = '#'; }


######  parse args
$#ARGV >= 2 && die("\n No more than 2 arguments\n\n"); 
defined $ARGV[0] || die(&usage($us));
my $ifile=$ARGV[0];
-e $ifile || die("\n Input file nonexistent.\n\n");

open(IFIL,"<$ifile") or die("problem opening input_file");
my @inputfile=<IFIL>;
close(IFIL); 


######  main
if ( $us eq "ee" ) {
   $topline = shift(@inputfile);
   die(&pwarn($comm)) if $topline =~ /\#\!.*perl/i ;
   unshift(@inputfile,$topline);
   &stripper(@inputfile);
} elsif ( $us eq "e" ) {
     $topline = shift(@inputfile);
     if ( $topline =~ /(\s+)\#\!/ ) {
        &stripper(@inputfile);
        unshift(@stripped,$topline);
       } else {
        unshift(@inputfile,$topline);
        &stripper(@inputfile);
     }
  } 


######  final output
if ( $ARGV[1] ) {
    open(OFIL,">$ARGV[1]") or die("problem creating output_file"); 
    for ( @stripped ) { print OFIL "$_\n"; }
    print "\n Done stripping $ifile\n     -\>  wrote output file \"$AR
+GV[1]\"\n\n";
    close(OFIL);
} else {
    for ( @stripped ) { print "$_\n"; }
  }
exit $?;




######  subs

sub stripper {
    for ( @_ ) {
        chomp;
        next if /^$comm|^(\s*)$comm|^(\s*)$/;
        $_ =~ s/$comm.*$//;
        push(@stripped,$_);
    }
    return @stripped;
}

sub usage {
 print qq[
   Usage:   e filename [outputfilename]
            ··························································
+······
            e strips comments and blank lines from an existing file.
            e to remove # comments, and ee to strip ; comments.
            
            See "perldoc e.pl"
            ··························································
+······
            e.pl v$version                                        invo
+ked as \'$us\'

]; 
exit(1);
}

sub pwarn {
 print  qq[
 WARNING:   Input file "$ifile" looks like a Perl script
            
            The first line was:   $topline
            When invoked as \'$us\', e.pl strips out semicolons,
            which might not be very useful for looking at a Perl scrip
+t.
            If this assumption is wrong, remove the first line tempora
+rily.


];
&usage;
exit(1);
}


__END__

=head1 NAME


e (and ee), symbolic links to e.pl



=head1 VERSION


Version 0.9



=head1 SYNOPSIS


 e   (e.pl, to be invoked as either "e" or "ee")

 e   args
ee   args




=head1 DESCRIPTION


B<e> (invoked as "e" or "ee") is a small program to strip unix style c
+omments ( e.g., "#" or ";" ) from scripts and configuration files. It
+ might be
 useful during system administration. It is called "e" simply for brev
+ity.

B<e> also removes blank lines, makes some effort not to destroy shell 
+scripts and shebangs, and tries to avoid mangling Perl scripts it enc
+ounters.

B<e> is meant to be run on Unix systems where #, #!, and ; are common 
+comments/patterns.

B<e> requires at least one argument, a filename to be processed.

B<e> tries to detect if the first line of the input file contains the 
+#! character sequence, and tries to preserve it, assuming it might be
+ a shell 
script.

B<e> will stop and warn you about removing semi-colons from a file it 
+thinks is a Perl script.




=head1 INSTALLATION


Install the main file, e.pl, somewhere in your path, then in the same 
+directory, do

  ln -s e.pl e
  ln -s e.pl ee

Use e or ee, depending on what character you want to strip.

Invoking e.pl directly breaks it.

If you already have an e or ee on your system, you may use other symbo
+lic links,
If you rename these files, you will have to adjust the main script acc
+ordingly.


=head1 EXAMPLES


=over 4

=item B<e> I<input_filename>

Strips # comments and blank lines out of "filename" and sends the resu
+lt to your screen.



=item B<e> I<input_filename> [I<output_filename>] 

Same as above, but the result will be written to a new file "output_fi
+lename" in the current directory.


=item B<ee> I<input_filename> [I<output_filename>] 

Same as above, but semicolon as the comment character.

=back



=head1 BUGS

Might not be able to preserve the shebang line in a shell script, when
+ the shebang line is preceded by one or more blank lines.



=head1 LIMITATIONS

Does not remove C style comments.

Inefficiently written, so uses lots of memory when input files get lar
+ger.

Cannot detect a "here" document, and will happily destroy the contents
+ of one when it encounters a comment character somewhere in there.


=head1 AUTHOR

Harold Sinclair
devel at hastek


=head1 COPYRIGHT

Copyright ©2004 hastek. All rights reserved.

This program is free software; you can redistribute it and/or modify i
+t under the same terms as Perl itself.


=cut

#EOF
Replies are listed 'Best First'.
Re: Comment Stripper script for unix
by Zaxo (Archbishop) on Jun 14, 2004 at 02:49 UTC

    I tried applying this script to itself. That was to check if significant uses of '#' were handled properly. The results were, uhhh . . . unfortunate.

    1. It stripped the shebang line, which doesn't look exotic at all.
    2. It did
      -if ( $us eq "ee" ) { $comm = ';'; } else { $comm = '#'; } +if ( $us eq "ee" ) { $comm = ';'; } else { $comm = '
      leaving an unclosed quote in the code.
    3. It did
      - die(&pwarn($comm)) if $topline =~ /\#\!.*perl/i ; + die(&pwarn($comm)) if $topline =~ /\
      leaving an open regex match.
    4. It did
      - if ( $topline =~ /(\s+)\#\!/ ) { + if ( $topline =~ /(\s+)\
      to the same effect.

    I think your e can only be applied in the simplest circumstances.

    Don't feel too bad, the saying goes, "Only perl can parse Perl." To do this sort of thing properly really does require a parser.

    After Compline,
    Zaxo

      Don't feel too bad, the saying goes, "Only perl can parse Perl." To do this sort of thing properly really does require a parser.
      ... or take a look at perltidy, which does a really good job on perl code formatting and also has a switch for stripping comments.
      Whoa - that's terrible - obviously I didn't test it with Perl scripts enough - I only used it with config files and shell scripts really - way too hasty ...

      This plain doesn't work and should be removed from the code catacombs - you all are too kind! Or maybe moved to the "don't let this happen to you" section?

      I didn't know Perltidy removed comments, so thanks for that eserte.

Re: Comment Stripper script for unix
by Abigail-II (Bishop) on Jun 14, 2004 at 15:06 UTC
    Input:
    #!/bin/bash # This is a comment. echo "# This is not a comment" echo \# and neither is this.
    Output:
    echo " echo \
    Your program will strip she-bang lines unless such a line starts with whitespace. However, whitespace isn't optional. The first 2 bytes of the file need to be #!, the kernel isn't going to skip over whitespace (and whitespace certainly isn't mandatory). Furthermore, the base of your program is an extremely symplistic regex - it just removes anything on a line starting at the first #. Your program could as well have been:
    perl -nle 's/#.*//; print if /\S/'

    But my biggest question is, why do you think this is useful for system administration? I don't know any system administrator who wants to remove comments from his configuration files or from his shell scripts.

    Abigail

      This is an annoying trend that's driving me nuts where I work to.. Somehow they are justifying it in the name of security. ( Even to the point of stripping comments from all applications.)

        I tend to ask people to elaborate on that, and ask them to explain how this is helping security. I also might point out that $ > /secret/file works even better (sure, it has some side-effects, but isn't security important enough that we can justify some side-effects?)

        Abigail

      Hi Abigail,

      Your program will strip she-bang lines unless such a line starts with whitespace.
      Are you sure about that? The shebang line is not stripped, if it is the first line, which gets preserved and re-inserted back into the final output..
      update- you're totally right about that, I screwed it up..

      why do you think this is useful for system administration..

      Because removing commented lines lets you get a quick view only of active lines - in a file that might have only a few active lines among several screens of commented lines, e.g. a stock squid.conf file..

      Thanks for the feedback!
        Because removing commented lines lets you get a quick view only of active lines - in a file that might have only a few active lines among several screens of commented lines, e.g. a stock squid.conf file..
        Well, a simple grep -v ^\# will do that. If an "active" line has a trailing comment, it doesn't matter. It also doesn't explain why you want to remove comments from a shell script.

        Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://366388]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 09:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found