Use of do() to run lots of perl scripts

chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Use of do() to run lots of perl scripts by stevieb (Canon) on Mar 02, 2021 at 18:34 UTC
I'd say that `do()` is a little faster. The script that was being called/shelled out to is just a few lines and is rather irrelevant: `Benchmark: timing 100000 iterations of do, sys... do: 4 wallclock secs ( 2.91 usr + 0.82 sys = 3.73 CPU) @ 26 +809.65/s (n=100000) sys: 441 wallclock secs ( 1.65 usr 19.06 sys + 304.06 cusr 117. +50 csys = 442.27 CPU) @ 226.11/s (n=100000)` [download] `Rate sys do sys 223/s -- -99% do 26110/s 11617% --` [download] Benchmark script: `use warnings; use strict; use Benchmark qw(timethese cmpthese); use lib '.'; timethese( 100000, { do => sub { do 'script.pl' }, sys => sub { system 'perl script.pl' }, } ); cmpthese( 100000, { do => sub { do 'script.pl' }, sys => sub { system 'perl script.pl' }, } );` [download]	[reply] [d/l] [select]
Re^2: Use of do() to run lots of perl scripts by choroba (Cardinal) on Mar 02, 2021 at 20:16 UTC
Even with fork "do" is still much faster. `cmpthese( -3, { do => sub { unless (fork) { do 'script.pl'; exit } wait }, sys => sub { system 'perl script.pl' }, } );` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^3: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 03, 2021 at 23:59 UTC
> Even with fork "do" is still much faster. Hmm... the whole picture might be more complicated. I just remembered that modern OS optimize the fork with a copy-on-write of the process' space. This means while the start of the fork might be very fast, it can slow down as soon as changes occur. OTOH this could also mean that large parts of the engine don't need to be physically copied, because they are static and no write is possible. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: Use of do() to run lots of perl scripts by jcb (Parson) on Mar 04, 2021 at 03:06 UTC
Re^2: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 02, 2021 at 18:44 UTC
if the intention of using `do` is to run all scripts on the same run-time engine, how are problems with changes in global state by individual scripts avoided? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: Use of do() to run lots of perl scripts by jcb (Parson) on Mar 03, 2021 at 03:29 UTC
That is the easy part that our questioner already mentioned: `fork or do $script;` or `unless (fork) { do $script; exit; }` Performance gains here will depend on how well the system implements `fork` — all modern real operating systems use copy-on-write, so `fork` itself will be very quick, but each child will execute do FILE independently. This later step will mean that perl will still need to compile every script for each request, which is probably our questioner's actual overhead problem. The best solution is probably to refactor the Perl scripts into modules that can be loaded into the master process, duplicated with everything else at `fork`, and then executed quickly in the forked child. Another possible workaround for compiling the scripts may be B::Bytecode and ByteLoader, although they do have some limitations. In this case, you would want the master process to have already loaded ByteLoader before forking: `use ByteLoader ();` will load the module without calling its `import` method.	[reply] [d/l] [select]
Re^4: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 03, 2021 at 10:19 UTC
Re^5: Use of do() to run lots of perl scripts by jcb (Parson) on Mar 04, 2021 at 02:54 UTC
Re: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 02, 2021 at 18:26 UTC
> Would it make sense to modify my application so that worker perl scripts are invoked via do() instead of system calls? no, the most common bottlenecks are access to FS to read file compilation time at start-up `do FILE` can't solve this, only pre-compiling all scripts and keeping them in memory, the way modperl or FCGI do. Although first step should be a thorough analysis where the performance is lost.� Worst-case would be that you'll need to expand to a multi-server farm. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} �) sometimes a RAM disk is the easiest solution. update I missed the info about the pre-froking. see Re^4: Use of do() to run lots of perl scripts for more	[reply] [d/l]
Re: Use of do() to run lots of perl scripts by shmem (Chancellor) on Mar 03, 2021 at 00:26 UTC
it make sense to modify my application so that worker perl scripts are invoked via do() instead of system calls? Giving bad advice is difficult to avoid if the situation isn't clear, in this case: of what kind are the performance issues? Memory, responsiveness, load of the server? do file reads a file and executes it. That happens in an unrelated scope with regards to the main script, but that can stomp on global variables or those declared with our, re-define subroutines and whatnot. So all of these 500+ scripts must be revised to check these issues before executing them via do. Having done that, I would make each script into a file containing just a distinct subroutine, and have them executed via AutoLoader. See POSIX for an example. If memory is an issue, I'd use AutoReloader, keep track of invocation time and memory usage, and periodically unload (via an alarm handler) those scripts (now subroutines) which are seldom used and/or cause the most memory impact. perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l]
Re^2: Use of do() to run lots of perl scripts by Anonymous Monk on Mar 03, 2021 at 00:51 UTC
`do file reads a file and executes it. That happens in an unrelated scope with regards to the main script,` Same scope is same scope	[reply] [d/l]
Re^3: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 03, 2021 at 00:58 UTC
> Same scope is same scope Nope! `It also differs in that code evaluated with do FILE cannot see lexicals in the enclosing scope` Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 03, 2021 at 15:48 UTC
Let's suppose you are always forking and the RAM consumption of all those forks (which means cloning the run-time engine, which needs 1.5MB++ on my system) is no issue. Let's suppose further it's the start-up time that matters. Like already explained does `do FILE` still imply overhead for A loading the file from FS B compiling that file (B2 compiling all modules used in that file) C running that code but `do` is just a glorified eval `cat FILE` mechanism. So your MASTER process could just keep all those scripts and used modules� in a big hash `$file{SCRIPT}` Now after forking to child THIS_SCRIPT_1 you only need to eval this script directly to compile it and you have deprived yourself already of point A the FS overhead. (As a further optimization you could now empty the hash `%file` to release memory of the child, tho I'm not sure if this would pay of) And now - provided that this script is started many times - you can fork again from THIS_SCRIPT_1 after compilation and THIS_SCRIPT_2 is executed with a clean start-up context and terminated at the end. Every time MASTER needs this particular script to be run again, he needs to communicate to child1 which forks again to a grandchild which is run again with precompiled code, solving the overhead of point B. Now it's up to you to decide if you want to keep all these 500 child-forks of level 1 constantly alive by reserving 1-2GB of RAM for it, I'd rather make it depended of a frequency count of the MASTER. (And I still doubt that compile time is an issue nowadays, but YMMV) I still think you are most probably reinventing the wheel here, because such strategies have certainly already be discussed in the context of web-servers. But it could solve your issues by buying time with space. and you will still need to benchmark all of this. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} update �) used modules B2 is a bit more complicated, anything with hooking into @INC could be used. You said these scripts are "simple", do they always use the same modules? In that case `require` the common ones in MASTER, this will do the compilation (hopefully without global side-effects) see also App::FatPacker et al...	[reply] [d/l] [select]
Re^2: Use of do() to run lots of perl scripts by afoken (Chancellor) on Mar 03, 2021 at 18:35 UTC
do is just a glorified eval `cat FILE` mechanism. ... without creating a sub-process and running cat, of course. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]
Re^3: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 03, 2021 at 18:37 UTC
of course that's why I said "mechanism" and linked to the `do`cs. `do "./stat.pl" is largely like` eval `cat stat.pl`; `except that it's more concise, runs no external processes, and keeps track of the current filename for error messages. It also differs in that code evaluated with do FILE cannot see lexicals in the enclosing scope; eval STRING does.` Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: Use of do() to run lots of perl scripts by bliako (Monsignor) on Mar 02, 2021 at 21:44 UTC
Can a webserver do this better than your diy-server (re: pre-fork, pre-load)? Even unix shell scripts can talk CGI.	[reply]
Re^2: Use of do() to run lots of perl scripts by LanX (Saint) on Mar 02, 2021 at 22:51 UTC
I'd certainly try to prototype this first with something like FCGI plus some high performance webserver - probably nginx. There are so many potential problems already solved, so much know-how already available in the human cloud. Like concept to scale up with a server farm... "Standing on shoulders of giants" and so on... And if this "prototype" doesn't deliver sufficiently, one can still learn from the concepts when implementing the DIY server. Saying so, I'm still in doubt the OP has really identified the bottlenecks of his existing solution. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: Use of do() to run lots of perl scripts (don't) by Anonymous Monk on Mar 03, 2021 at 00:28 UTC
'do' is too simple One approach to the issue is CGI::Compile - Compile .cgi scripts to a code reference like ModPerl::Registry but even that requires proper Coping with scoping also known as the spirit of strict also known as the life cycle of variables or memory management	[reply]
Re: Use of do() to run lots of perl scripts by Anonymous Monk on Mar 03, 2021 at 21:19 UTC
I see lots of "solutions" being tossed around here on a presumption of what the actual problem is. IS the source of the performance issue ACTUALLY the overhead of launching a new Perl? If the program blindly launches a new subprocess with every request, the problem might simply be "thrashing." Maybe the application needs a way to put a cap on how many children are running at a time. You say that most of the scripts are trivial, but how many are run most often? Are any of them actually expensive? You need log files that you can analyze to see what the bottlenecks actually are. Guesses won't get you answers.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.

Use of do() to run lots of perl scripts

update

update