Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Perl script compressor

by afoken (Chancellor)
on Dec 08, 2019 at 17:33 UTC ( [id://11109842]=note: print w/replies, xml ) Need Help??


in reply to Perl script compressor

I am trying to write a perl script which can compress a perl script without breaking it. Its job is to remove comments, new line characters, and unnecessary whitespace.

Why?

Just for fun or education? That would be ok.

But I see no other reason for doing so. 180 kByte floppy disks are gone since decades, commonly available mass storage is in the GByte or TByte range, so there is no shortage of disk space. Perl's compile phase won't be speed up significantly by stripping whitespace and comments. Any kind of electronic transmission can be significantly accelerated by applying state of the art compression (e.g. bzip2, lzma) before transmission. (Just for fun: Current CGI.pm has 123 KBytes, gzip compresses that to 36 KBytes, bzip2 and lzma even down to 32 KBytes, all without any loss of information!)

Minimizing and transparent compression, as usual for jQuery and others, does not make sense for anything but web browsers. And running perl in a web browser is possible, but anything but common.

So what's left? Creating a maintainance nightmare, just because you can?

Or did I miss something?

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^2: Perl script compressor
by shmem (Chancellor) on Dec 08, 2019 at 18:08 UTC
    Or did I miss something?

    Probably. Many Javascript blurbs are delivered either as foo.js or foo-minimal.js, so that might be some "State Of The Art" thing. Or code obfuscation for *cough* EULA reasons. Or some such.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      Many Javascript blurbs are delivered either as foo.js or foo-minimal.js

      True, but I'm pretty sure JS can be parsed with a static parse, and so a reduction like that is safer than with Perl.

        That's mostly because JS is downloaded by the client on each request, so there's real network savings by minimizing the amount of data coming across the wire.

        That's a much different reason than simply reading a server-side script directly from disk.

Re^2: Perl script compressor
by Anonymous Monk on Dec 08, 2019 at 18:35 UTC
    Yes, I was thinking about writing a cgi script to run on a server. If I remove spaces, the OS might be able to load the whole script with one disk read. But if it's bigger, then it might take two or three. So, the smaller the code, the faster it load and the more likely it is that it will remain in the cache. So, if it runs multiple times, the OS might not even have to load it from the disk. So, that was the whole purpose of me doing this. --harangzsolt33 (I'm currently not logged in)

      I suggest using strace while running your script. If your script uses strict, warnings, CGI.pm, or any other modules, they also get opened and read into memory. If your Perl is configured to use sitecustomize.pl, that will be opened and read in. And if it's been read in once, with an OS like Linux there's a chance the files are hot and ready in a cache anyway. But strace will demonstrate to you that the top level script you load is not the largest component that gets read in from a file.

      The bulk of startup time has little to do with just reading the program file in from (hopefully) an SSD. I created two Hello World scripts; one with 13368 lines, consuming 1.1 megabytes on disk, and one with seven lines, consuming 96 bytes on disk. They both start by printing "Hello world\n", and end by printing "Goodbye world\n", but in the first script there are 13361 80-column lines of comments between the two print statements. Perl must read the entire file before getting to the final Goodbye world. Here are the timings:

      $ time ./mytest.pl Hello world Goobye world real 0m0.022s user 0m0.014s sys 0m0.008s $ time ./mytest2.pl Hello world Goodbye world real 0m0.008s user 0m0.008s sys 0m0.001s

      A tremendous increase from eight milliseconds to twenty two. We go from running 45 times per second to 125 *if* the bloated script is 1 megabyte in size, and if all that bloat (including the parts that you bring in from CPAN and core-Perl libs) can be reduced to 96 bytes. What if the source script is 64kb? Let's try that:

      $ time ./mytest.pl Hello world Goodbye world real 0m0.009s user 0m0.004s sys 0m0.005s

      So now we're talking about 1 millisecond difference. Instead of 125 runs per second, we have 111 per second, for a much more typically-sized script.

      If startup time is a problem you won't solve it by minifying your Perl script. It's better solved by converting over to a daemon process that stays resident, or if that's really impossible, scaling out horizontally.


      Dave

      You want to take a step back. What measurement have you done to examine the compilation time for a script with comments and spaces vs the same code without? If you're worried about performance profile your code (Devel::NYTProf). You should also read How can I make my CGI script more efficient? Or just move to an approach which isn't CGI scripts, it was removed for good reason, and starting anything new with it is actively discouraged.

      If you're that concerned about disk reads, I'd just copy the file into shared memory space (eg: /dev/shm) on system or web server startup, then read the script from there instead.

      Or, use a system that only has to read the file once upon web server instantiation.

      This seems like premature optimization.

      Update: I thought some more about this. If you're unit testing your code (which you should be for sure!), you'd have to run the tests again on this automatically re-written code in case something is lost in translation.

      I'm all for doing things for education and learning purposes, but I don't think the risk is worth it if the sole objective is to use the code to try to make something a fraction of a nanosecond (obviously estimated) more efficient.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11109842]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-29 01:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found