Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I am trying to write a perl script which can compress a perl script without breaking it. Its job is to remove comments, new line characters, and unnecessary whitespace.

I have gotten so far that I can remove most comments and spaces, but I don't know how to interpret lines which contain regex and quotes and double-quotes. Everybody knows that I can't just chop off everything that comes after the '#' character, because if it's within a print statement, then the script no longer works:

  print "\nHello World ###";
  return;
}

would become:

print"\nHello Worldreturn;}

So, I need to come up with a logic that can interpret a line of complex perl code and remove comments and spaces in such a way that it won't break the code.

I just started learning perl 3 years ago, so I am not even sure what kind of complicated code patterns may exist in a perl script. I always see something new and marvel "What in the world is this code!! I've never seen anything like it." Lol

Ok, here is what I have done so far:

#!/usr/bin/perl -w # # This perl script can compress itself by getting rid # of comments, spaces, tabs, new lines, and reducing # the perl code to its minimal size where it will # still function. # Written by Zsolt on Dec. 7, 2019. <zsnp@juno.com> # Tested on TinyPerl 5.8 (Windows XP). # # When $WRITE_FILE is true, it will save the output to # a file called OUTPUT##.pl where ## is a number. # ################################################## use strict; use warnings; my $WRITE_FILE = 0; my $SELF = ReadFile(__FILE__); my $SHRUNK = CompactPerl($SELF); my $ORIGINAL_SIZE = length($SELF); my $SHRUNK_SIZE = length($SHRUNK); my $DIFF = $ORIGINAL_SIZE - $SHRUNK_SIZE; print '>' x 78; print "\n", $SHRUNK; print '^' x 78; print "\n\nFILE NAME : ", __FILE__; print "\nFILE SIZE : $ORIGINAL_SIZE bytes"; print "\nSHRUNK SIZE : $SHRUNK_SIZE bytes"; WriteOutput($SHRUNK) if ($WRITE_FILE); if ($SHRUNK_SIZE < $ORIGINAL_SIZE) { print "\n\nYou could eliminate $DIFF bytes by compressing this scrip +t.\n"; } print <<'END_OF_MESSAGE'; Wow, this script is amazing! * * * * * * * * * * * * * * * * Yes, it really is! :D END_OF_MESSAGE exit; ################################################## # # Creates and overwrites a file in binary mode. # Returns 1 on success or 0 if something went wrong. # Usage: INTEGER = CreateFile(FILE_NAME, CONTENT) # sub CreateFile { defined $_[0] or return 0; my $F = $_[0]; $F =~ tr#\"\0*?|<>##d; # Remove special characters length($F) or return 0; local *FH; open(FH, ">$F") or return 0; binmode FH; if (defined $_[1] ? length($_[1]) : 0) { print FH $_[1]; } close FH or return 0; print "\n$F was saved.\n"; return 1; } ################################################## # # This function reads the entire contents of a file # in binary mode and returns it as a string. If an # errors occur, an empty string is returned silently. # A second argument will move the file pointer before # reading. And a third argument limits the number # of bytes to read. # Usage: STRING = ReadFile(FILENAME, [START, [LENGTH]]) # sub ReadFile { my $NAME = defined $_[0] ? $_[0] : ''; $NAME =~ tr/\"\0*?|<>//d; # Remove special characters -e $NAME or return ''; -f $NAME or return ''; my $SIZE = -s $NAME; $SIZE or return ''; my $LEN = defined $_[2] ? $_[2] : $SIZE; $LEN > 0 or return ''; local *FH; sysopen(FH, $NAME, 0) or return ''; binmode FH; my $POS = defined $_[1] ? $_[1] : 0; $POS < $SIZE or return ''; $POS < 1 or sysseek(FH, 0, $POS); # Move file ptr my $DATA = ''; sysread(FH, $DATA, $LEN); # Read file close FH; return $DATA; } ################################################## # v2019.8.25 # Removes whitespace from before and after STRING. # Whitespace is here defined as any byte whose # ASCII value is less than 33. That includes # tabs, esc, null, vertical tab, new lines, etc. # Usage: STRING = Trim(STRING) # sub Trim { defined $_[0] or return ''; my $L = length($_[0]); $L or return ''; my $P = 0; while ($P <= $L && vec($_[0], $P++, 8) < 33) {} for ($P--; $P <= $L && vec($_[0], $L--, 8) < 33;) {} substr($_[0], $P, $L - $P + 2) } ################################################## # # Saves the shrunk code in a file named output##.pl # where ## is a number. If output1.pl already # exists, then it will create output2.pl... # Usage: WriteOutput(CONTENT) # sub WriteOutput { defined $_[0] or return; length($_[0]) or return; my $OUTPUT_FILENAME; for (my $i = 1; $i < 100; $i++) { $OUTPUT_FILENAME = "compact$i.pl"; -e $OUTPUT_FILENAME or return CreateFile($OUTPUT_FILENAME, $_[0]); } } ################################################## # This function returns 1 if STRING ends with any # of the characters in LIST. # Usage: INTEGER = EndsWithChar(STRING, LIST) # sub EndsWithChar { (@_ == 2 && defined $_[0] && defined $_[1] && length($_[1]) && lengt +h($_[0])) ? (index($_[1], substr($_[0], length($_[0]) - 1)) < 0 ? 0 : + 1) : 0; } ################################################## # # This function receives one line of perl code # which contains one or more of the following: # quotes, double-quotes, a regex pattern, and # may also contain a comment at the end of the line. # The job of this function is to remove the # comments and spaces without breaking the code! # Usage: STRING = CompactLine(STRING) # sub CompactLine { defined $_[0] or return ''; length($_[0]) or return ''; my $OUTPUT = Trim($_[0]); $OUTPUT .= "\n"; # <<< THIS PART IS UNDER CONSTRUCTION >>> return $OUTPUT; } ################################################## # # This function trims whitespace from before and after # a list of perl operators... This function modifies # the first argument and does not return anything. # Usage: TrimOperators(STRING) # # Example: TrimOperators($LINE) # This will look for ' + ' in $LINE and replace it # with '+' It will then look for ' - ' and replace # it with '-' and so forth with all the characters # listed in $OP. # sub TrimOperators { defined $_[0] or return; length($_[0]) or return; # print "\nTrimOperators>", $_[0]; my $c; my $OP = '+-,.|&*/<>?:;!%^()[]{}='; for (my $i = 0; $i < length($OP); $i++) { $c = substr($OP, $i, 1); $_[0] =~ s/\s*\Q$c\E\s*/$c/g; } # print "\nTrimOperators>", $_[0], "\n"; } ################################################## # v2019.12.7 # This function takes regular perl source code # as an input string and outputs compact perl code # that has spaces and comments removed. # Usage: STRING = CompactPerl(STRING) # sub CompactPerl { defined $_[0] or return ''; my $CODE = $_[0]; $CODE =~ tr|\r||d; my @A = split(/\n/, $CODE); # Split lines my $OP = '+-,.|&*/<>?:;!%^()[]{}='; my $STRIP; # Part of a line that comes before '#' my $P; my $LINE; my $TRIMMED; my $END_MARKER = ''; my $MULTILINE_STRING = 0; my $STOP = 0; $A[0] .= "\n"; # Skip shebang for (my $i = 1; $i < @A; $i++) { $LINE = $A[$i]; # Remove everything after the '#' sign for easier processing $P = index($LINE, '#'); $STRIP = Trim( ($P < 0) ? $LINE : substr($LINE, 0, $P) ); # Skip here-documents if ($MULTILINE_STRING == 0 && $STRIP =~ m/\s*<<['"]+([_A-Za-z0-9]* +)['"]+/) { $END_MARKER = $1; $MULTILINE_STRING = 2; $A[$i] .= "\n"; n +ext; } if ($MULTILINE_STRING == 2) { index($LINE, $END_MARKER) < 0 or $MU +LTILINE_STRING = 0; $A[$i] .= "\n"; next; } # Skip processing qq{ } text blocks if ($MULTILINE_STRING == 0 && index($STRIP, 'qq{') >= 0) { $MULTIL +INE_STRING = 1; } if ($MULTILINE_STRING == 1) { if (index($LINE, '}') < 0) { $A[$i] +.= "\n"; } else { $MULTILINE_STRING = 0; } next; } # Remove everything after __END__ if ($STRIP eq '__END__') { $STOP = 1; $MULTILINE_STRING = -1; } if ($STOP == 1) { $A[$i] = ''; next; } # Skip __DATA__ section if ($MULTILINE_STRING == 0 && $STRIP eq '__DATA__') { $A[$i] = "\n +__DATA__"; $MULTILINE_STRING = 3; } if ($MULTILINE_STRING == 3) { $A[$i] .= "\n"; next; } # Remove comment lines if (length($STRIP) == 0) { $A[$i] = ''; next; } # Any line that contains the characters ' " or ~ # is going to be processed by CompactLine() instead if ($LINE =~ m/[~\"\']+/) { $A[$i] = CompactLine($LINE); next; } $TRIMMED = Trim($STRIP); $TRIMMED =~ s/([a-z])\s+\(/$1\(/g; # Reduce 'my ($A' --> 'my($ +A' $TRIMMED =~ s/([a-z])\s+\%/$1\%/g; # Reduce 'my %A' --> 'my%A +' $TRIMMED =~ s/([a-z])\s+\$/$1\$/g; # Reduce 'my $A' --> 'my$A +' $TRIMMED =~ s/([a-z])\s+\@/$1\@/g; # Reduce 'my @A' --> 'my@A +' TrimOperators($TRIMMED); # Reduce '$A + $B' --> '$A+$B' # If a line ends with a letter and the next line starts with # the word 'or' then we'll break the code if we join the two. # So, unless a line ends with an operator symbol, # we should add a space. EndsWithChar($TRIMMED, $OP) or $TRIMMED .= ' '; # Here is an example: $LINE or $LINE; $A[$i] = $TRIMMED; } return join('', @A); } ################################################## __DATA__ Everything after this point will not get compacted. It gets written .... AS IS. No change. __END__ # This is the end of file and whatever comes after this point gets discarded forever..... ............

In reply to Perl script compressor by harangzsolt33

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-19 19:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found