http://qs1969.pair.com?node_id=11129023

chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

I am working on an legacy perl application that frequently invokes other perl scripts. The application runs as a server, and in response to each remote connection it does a fork and the child invokes another worker script via a system call. Most but not all of these scripts are perl.

There are about 500 perl scripts that could be run, and most are trivial. The volume of incoming connections is high, and the application has performance issues, which I suspect are in part caused by the overhead of invoking a fresh perl interpreter for each of these trivial scripts.

I searched online and found the do builtin in perl and this article about it.

Would it make sense to modify my application so that worker perl scripts are invoked via do() instead of system calls?

I am running perl 5.10 via Carton (The legacy app is not compatible with more recent perl versions due to use of Storable)

Would there be any issue with non-trivial perl scripts?

Do I need to wrapper the call to do() in an eval block or does that happen automatically?

Bear in mind that the caller of each worker script is a forked child from the main server process, so if the worker script goes wrong in some way such as leaking memory, the main server process is in the parent so should not be affected.

There are no security concerns here. All the code is trusted, and was written by company employees, so I only have to worry about mistakes, not malice.

Replies are listed 'Best First'.
Re: Use of do() to run lots of perl scripts
by stevieb (Canon) on Mar 02, 2021 at 18:34 UTC

    I'd say that do() is a little faster. The script that was being called/shelled out to is just a few lines and is rather irrelevant:

    Benchmark: timing 100000 iterations of do, sys... do: 4 wallclock secs ( 2.91 usr + 0.82 sys = 3.73 CPU) @ 26 +809.65/s (n=100000) sys: 441 wallclock secs ( 1.65 usr 19.06 sys + 304.06 cusr 117. +50 csys = 442.27 CPU) @ 226.11/s (n=100000)

    Rate sys do sys 223/s -- -99% do 26110/s 11617% --

    Benchmark script:

    use warnings; use strict; use Benchmark qw(timethese cmpthese); use lib '.'; timethese( 100000, { do => sub { do 'script.pl' }, sys => sub { system 'perl script.pl' }, } ); cmpthese( 100000, { do => sub { do 'script.pl' }, sys => sub { system 'perl script.pl' }, } );
      Even with fork "do" is still much faster.
      cmpthese( -3, { do => sub { unless (fork) { do 'script.pl'; exit } wait }, sys => sub { system 'perl script.pl' }, } );
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        > Even with fork "do" is still much faster.

        Hmm... the whole picture might be more complicated.

        I just remembered that modern OS optimize the fork with a copy-on-write of the process' space.

        This means while the start of the fork might be very fast, it can slow down as soon as changes occur.

        OTOH this could also mean that large parts of the engine don't need to be physically copied, because they are static and no write is possible.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      if the intention of using do is to run all scripts on the same run-time engine, how are problems with changes in global state by individual scripts avoided?

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        That is the easy part that our questioner already mentioned: fork or do $script; or unless (fork) { do $script; exit; }

        Performance gains here will depend on how well the system implements fork — all modern real operating systems use copy-on-write, so fork itself will be very quick, but each child will execute do FILE independently. This later step will mean that perl will still need to compile every script for each request, which is probably our questioner's actual overhead problem.

        The best solution is probably to refactor the Perl scripts into modules that can be loaded into the master process, duplicated with everything else at fork, and then executed quickly in the forked child.

        Another possible workaround for compiling the scripts may be B::Bytecode and ByteLoader, although they do have some limitations. In this case, you would want the master process to have already loaded ByteLoader before forking: use ByteLoader (); will load the module without calling its import method.

Re: Use of do() to run lots of perl scripts
by LanX (Saint) on Mar 02, 2021 at 18:26 UTC
    > Would it make sense to modify my application so that worker perl scripts are invoked via do() instead of system calls?

    no, the most common bottlenecks are

    • access to FS to read file
    • compilation time at start-up

    do FILE can't solve this, only pre-compiling all scripts and keeping them in memory, the way modperl or FCGI do.

    Although first step should be a thorough analysis where the performance is lost.°

    Worst-case would be that you'll need to expand to a multi-server farm.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) sometimes a RAM disk is the easiest solution.

    update

    I missed the info about the pre-froking. see Re^4: Use of do() to run lots of perl scripts for more

Re: Use of do() to run lots of perl scripts
by shmem (Chancellor) on Mar 03, 2021 at 00:26 UTC
    it make sense to modify my application so that worker perl scripts are invoked via do() instead of system calls?

    Giving bad advice is difficult to avoid if the situation isn't clear, in this case: of what kind are the performance issues? Memory, responsiveness, load of the server?

    do file reads a file and executes it. That happens in an unrelated scope with regards to the main script, but that can stomp on global variables or those declared with our, re-define subroutines and whatnot. So all of these 500+ scripts must be revised to check these issues before executing them via do.

    Having done that, I would make each script into a file containing just a distinct subroutine, and have them executed via AutoLoader. See POSIX for an example. If memory is an issue, I'd use AutoReloader, keep track of invocation time and memory usage, and periodically unload (via an alarm handler) those scripts (now subroutines) which are seldom used and/or cause the most memory impact.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
       do file reads a file and executes it. That happens in an unrelated scope with regards to the main script,

      Same scope is same scope

Re: Use of do() to run lots of perl scripts
by LanX (Saint) on Mar 03, 2021 at 15:48 UTC
    Let's suppose you are always forking and the RAM consumption of all those forks (which means cloning the run-time engine, which needs 1.5MB++ on my system) is no issue.

    Let's suppose further it's the start-up time that matters.

    Like already explained does do FILE still imply overhead for

    • A loading the file from FS
    • B compiling that file
    • (B2 compiling all modules used in that file)
    • C running that code

    but do is just a glorified eval `cat FILE` mechanism.

    So your MASTER process could just keep all those scripts and used modules° in a big hash $file{SCRIPT}

    Now after forking to child THIS_SCRIPT_1 you only need to eval this script directly to compile it and you have deprived yourself already of point A the FS overhead.

    (As a further optimization you could now empty the hash %file to release memory of the child, tho I'm not sure if this would pay of)

    And now - provided that this script is started many times - you can fork again from THIS_SCRIPT_1 after compilation and THIS_SCRIPT_2 is executed with a clean start-up context and terminated at the end.

    Every time MASTER needs this particular script to be run again, he needs to communicate to child1 which forks again to a grandchild which is run again with precompiled code, solving the overhead of point B.

    Now it's up to you to decide if you want to keep all these 500 child-forks of level 1 constantly alive by reserving 1-2GB of RAM for it, I'd rather make it depended of a frequency count of the MASTER. (And I still doubt that compile time is an issue nowadays, but YMMV)

    I still think you are most probably reinventing the wheel here, because such strategies have certainly already be discussed in the context of web-servers.

    But it could solve your issues by buying time with space. and you will still need to benchmark all of this.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    °) used modules B2 is a bit more complicated, anything with hooking into @INC could be used. You said these scripts are "simple", do they always use the same modules? In that case require the common ones in MASTER, this will do the compilation (hopefully without global side-effects)

    see also App::FatPacker et al...

      do is just a glorified eval `cat FILE` mechanism.

      ... without creating a sub-process and running cat, of course.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        of course that's why I said "mechanism" and linked to the docs.

        do "./stat.pl" is largely like

        eval `cat stat.pl`;

        except that it's more concise, runs no external processes, and keeps track of the current filename for error messages. It also differs in that code evaluated with do FILE cannot see lexicals in the enclosing scope; eval STRING does.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Use of do() to run lots of perl scripts
by bliako (Monsignor) on Mar 02, 2021 at 21:44 UTC

    Can a webserver do this better than your diy-server (re: pre-fork, pre-load)? Even unix shell scripts can talk CGI.

      I'd certainly try to prototype this first with something like FCGI plus some high performance webserver - probably nginx.

      There are so many potential problems already solved, so much know-how already available in the human cloud.

      Like concept to scale up with a server farm...

      "Standing on shoulders of giants" and so on...

      And if this "prototype" doesn't deliver sufficiently, one can still learn from the concepts when implementing the DIY server.

      Saying so, I'm still in doubt the OP has really identified the bottlenecks of his existing solution.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Use of do() to run lots of perl scripts (don't)
by Anonymous Monk on Mar 03, 2021 at 00:28 UTC
Re: Use of do() to run lots of perl scripts
by Anonymous Monk on Mar 03, 2021 at 21:19 UTC
    I see lots of "solutions" being tossed around here on a presumption of what the actual problem is. IS the source of the performance issue ACTUALLY the overhead of launching a new Perl? If the program blindly launches a new subprocess with every request, the problem might simply be "thrashing." Maybe the application needs a way to put a cap on how many children are running at a time. You say that most of the scripts are trivial, but how many are run most often? Are any of them actually expensive? You need log files that you can analyze to see what the bottlenecks actually are. Guesses won't get you answers.
A reply falls below the community's threshold of quality. You may see it by logging in.