Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Use of do() to run lots of perl scripts

by stevieb (Canon)
on Mar 02, 2021 at 18:34 UTC ( #11129025=note: print w/replies, xml ) Need Help??


in reply to Use of do() to run lots of perl scripts

I'd say that do() is a little faster. The script that was being called/shelled out to is just a few lines and is rather irrelevant:

Benchmark: timing 100000 iterations of do, sys... do: 4 wallclock secs ( 2.91 usr + 0.82 sys = 3.73 CPU) @ 26 +809.65/s (n=100000) sys: 441 wallclock secs ( 1.65 usr 19.06 sys + 304.06 cusr 117. +50 csys = 442.27 CPU) @ 226.11/s (n=100000)

Rate sys do sys 223/s -- -99% do 26110/s 11617% --

Benchmark script:

use warnings; use strict; use Benchmark qw(timethese cmpthese); use lib '.'; timethese( 100000, { do => sub { do 'script.pl' }, sys => sub { system 'perl script.pl' }, } ); cmpthese( 100000, { do => sub { do 'script.pl' }, sys => sub { system 'perl script.pl' }, } );

Replies are listed 'Best First'.
Re^2: Use of do() to run lots of perl scripts
by choroba (Archbishop) on Mar 02, 2021 at 20:16 UTC
    Even with fork "do" is still much faster.
    cmpthese( -3, { do => sub { unless (fork) { do 'script.pl'; exit } wait }, sys => sub { system 'perl script.pl' }, } );
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      > Even with fork "do" is still much faster.

      Hmm... the whole picture might be more complicated.

      I just remembered that modern OS optimize the fork with a copy-on-write of the process' space.

      This means while the start of the fork might be very fast, it can slow down as soon as changes occur.

      OTOH this could also mean that large parts of the engine don't need to be physically copied, because they are static and no write is possible.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        There will be brief latency spikes as CoW page links are broken. Assuming that there is sufficient RAM available that this does not result in using swap, there will be no lasting slow down. The latency spikes can hit either parent or child process, whichever first writes to a CoW page.

        Note that newer Linux kernels also have a "kernel same-page merging" feature that opportunistically searches physical memory for pages that happen to have the same contents and replaces them with a single CoW page. If this is enabled, CoW-break latencies can hit even unrelated processes, if the kernel happened to notice that they had pages with the same contents. Note also that CoW-break should be much faster than swap and pages can also be swapped out, so this should not be a significant performance concern.

        The Perl runtime itself is written in C and therefore compiled in advance and demand loaded by mmapping libperl. Read-only mappings like those used for executable machine code are (or should be...) always shared between all processes that map the same file. You should only have one copy of libperl in RAM no matter how many (unrelated) perl processes you have running, but each Perl interpreter has considerable data structures that are built independently and not mapped from the filesystem and therefore will probably not be shareable between unrelated processes, although fork will "copy" them and "same-page merging" could combine them if two processes happen to have byte-identical structures.

Re^2: Use of do() to run lots of perl scripts
by LanX (Sage) on Mar 02, 2021 at 18:44 UTC
    if the intention of using do is to run all scripts on the same run-time engine, how are problems with changes in global state by individual scripts avoided?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      That is the easy part that our questioner already mentioned: fork or do $script; or unless (fork) { do $script; exit; }

      Performance gains here will depend on how well the system implements fork — all modern real operating systems use copy-on-write, so fork itself will be very quick, but each child will execute do FILE independently. This later step will mean that perl will still need to compile every script for each request, which is probably our questioner's actual overhead problem.

      The best solution is probably to refactor the Perl scripts into modules that can be loaded into the master process, duplicated with everything else at fork, and then executed quickly in the forked child.

      Another possible workaround for compiling the scripts may be B::Bytecode and ByteLoader, although they do have some limitations. In this case, you would want the master process to have already loaded ByteLoader before forking: use ByteLoader (); will load the module without calling its import method.

        Yes I oversaw the fork part until choroba posted his other benchmark.

        Do or even require alone are not fast. Reducing the start up of perl might have a time impact but won't change the RAM consumption.

        My bet on the biggest time consumer is the filesystem not the compilation. Precompiling really payed off in the 90s, but now?

        So using a RAM-disk could have the best cost benefit ratio.

        But we are all speculating here, like others repeated over and over again, the OP should be more explicit

        • what his problems are (startup, ram, ...)
        • how frequently this happens
        • what he benchmarked.

        I have my doubts that refactoring 500 scripts is an option and even then...

        Precompiling them all into the master process would make them vulnerable to global effects in the BEGIN-phase.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11129025]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-05-28 06:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (98 votes). Check out past polls.

    Notices?