http://qs1969.pair.com?node_id=11126026

cherio has asked for the wisdom of the Perl Monks concerning the following question:

When I run strace (I've seen discussions that start this way but no answer to my question that follows) on my script it performs a lot of "stat" lookups before it succeeds to find the right modules. A sample sequence of "stat" tried file candidates for use Time::HiRes is below with the last one being valid.

/etc/perl/Time/HiRes.pmc
/etc/perl/Time/HiRes.pm
/usr/local/lib/x86_64-linux-gnu/perl/5.30.0/Time/HiRes.pmc
/usr/local/lib/x86_64-linux-gnu/perl/5.30.0/Time/HiRes.pm
/usr/local/share/perl/5.30.0/Time/HiRes.pmc
/usr/local/share/perl/5.30.0/Time/HiRes.pm
/usr/lib/x86_64-linux-gnu/perl5/5.30/Time/HiRes.pmc
/usr/lib/x86_64-linux-gnu/perl5/5.30/Time/HiRes.pm
/usr/share/perl5/Time/HiRes.pmc
/usr/share/perl5/Time/HiRes.pm
/usr/lib/x86_64-linux-gnu/perl/5.30/Time/HiRes.pmc
/usr/lib/x86_64-linux-gnu/perl/5.30/Time/HiRes.pm

We use a lot of real time perl scripting in an envoronment with high IOPS. Accumulated load time lag becomes noticeable. I am looking for a way to eliminate lookups on non-existent files and directories.

One solution I can think of is to create links like

ln -s /usr/lib/x86_64-linux-gnu/perl/5.30/Time /etc/perl/Time
This solution sounds a bit scary. It would require creating dozens of links (one per package) and it would be dependent on the the distribution package management, which may change during upgrades. I am looking for something safer, something more ... standard (?)

LD_PRELOAD environment variable comes to mind but I am not at that level to fully understand how to correctly use it.

Perl seems to have no default /etc configuration, at least not the way many other linux software use it. So this is my predicament.

Replies are listed 'Best First'.
Re: Perl startup and excessive "stat" use on module load
by Fletch (Bishop) on Dec 31, 2020 at 05:03 UTC

    Check out perlrun for the details on PERL5LIB in the environment and the search path perl uses; also of interest would be the lib module. That /etc/perl directory along with several other variant directories doesn't look normal so that may be something your OS' perl set (if you're using the stock perl; which has its own issues) or you may have something setting PERL5LIB. WRT to the number of stats, that's just the way perl searches; it walks each directory on the search path looking for the requested module. The more directories you put on the search path, the more it's going to search.

    So given that if you're seeing unacceptable delays one option would be to trim the search path down. Presuming you've got extra things on PERL5LIB then clearing that out in the environment would be the easiest option. If the extra directories have been compiled into your copy of perl itself then it's not so easy. You'd need to recompile your own perl without those extra directories added to the compiled in search path (but there's also benefits to having your own application perl separate from the OS' copy).

    However even if you do build your own perl you'll probably wind up with about 4-5 directories to search. If even that much overhead is an issue then there may not an easy answer short of attempting something like you were thinking with a symlink farm. In that case the better option would be to look at redesigning things to not repeatedly spawn fresh perl processen which are going to need to search; if you can instead work with a persistent process (or maybe one that runs multiple passes; handwaving here without details but maybe if you could give a bit more about your environment and the "real time perl scripting" involved would probably help get you more applicable suggestions). Amortizing the stats across a longer process lifetime would help lessen the hit for any one item you process.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      If the extra directories have been compiled into your copy of perl itself then it's not so easy.

      I think it's pretty easy - the perlbrew installs on my machine only have four entries in @INC, and building a copy of Perl without perlbrew is pretty easy too.

      $ wget https://www.cpan.org/src/5.0/perl-5.32.0.tar.xz $ tar -xaf perl-5.32.0.tar.xz $ cd perl-5.32.0 $ sh Configure -de -Dprefix=/opt/perl5.32 $ make -j4 $ TEST_JOBS=4 make test_harness $ make install $ /opt/perl5.32/bin/perl -le 'print for @INC' /opt/perl5.32/lib/site_perl/5.32.0/x86_64-linux /opt/perl5.32/lib/site_perl/5.32.0 /opt/perl5.32/lib/5.32.0/x86_64-linux /opt/perl5.32/lib/5.32.0

        It's not rocket science, Smithers; it's brain surgery! However when compared against (possibly) unsetting an environment variable though that's a little bit more effort.

        Slight snark aside though: yes that's certainly something OP should consider doing anyhoo if they're using the stock perl on their platform, @INC length issues not withstanding. If that also solves their stat overhead problem all the better.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Re: Perl startup and excessive "stat" use on module load
by shmem (Chancellor) on Dec 31, 2020 at 15:36 UTC
    We use a lot of real time perl scripting in an envoronment with high IOPS. Accumulated load time lag becomes noticeable.

    What is the amount of time needed for the stat syscalls compared to the time needed for reading and compiling a single module file? I would answer that question first. To me, without having proof, it looks like micro-optimization at the wrong place. Making stuff resident in memory to avoid startup overhead and/or have persistent processes is imho a better optimization. Of course, depends on what you are doing.

    Another way to avoid overhead is e.g. App::FatPacker - lump all together and you have a single file to stat, and btw avoid all those open/close syscalls.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Perl startup and excessive "stat" use on module load
by LanX (Saint) on Dec 31, 2020 at 04:55 UTC
    It's a feature not a bug.

    Manipulating the libs help you dynamically experimenting with different versions.

    So whenever you are adding a new module to one of your libraries you need to adjust your static load-path solution again.

    Are you really sure it's worth it?

    I'm not aware of any standard way to do this or even a module for it.

    That's how I'd do it.

    I would dump %INC after all modules are loaded, because the values are the path where they've been found.

    Then I'd set an @INC hook on the first position which loads exactly those recorded modules. See require

    We just had a discussion on hooks.

    Here a working example: [WEBPERL] dynamically importing non-bundled modules via http

    What you'll need is a mechanism to deactivate the hook so that you can record the normal load paths again.

    Like an extra flag in %ENV.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    I should have mentioned that the usual way to avoid load overhead is a persistent process like with FastCGI or other deamon solutions like Proc::Daemon .

    That's faster than your idea because it's also avoiding to compile all the code.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Perl startup and excessive "stat" use on module load
by salva (Canon) on Dec 31, 2020 at 11:42 UTC
    In practice those syscalls are not going to result in IO operations because they all refer to a small set of directories that are going to be cached. So, there is really no need to worry about them.
Re: Perl startup and excessive "stat" use on module load
by eyepopslikeamosquito (Archbishop) on Dec 31, 2020 at 06:53 UTC

    If you are building your own Perl from source, you should be able to find a solution. Unfortunately, it's been ages since I've done any of this, but as indicated here I used to build Perl (and many CPAN modules) from source on about 10 different Unix flavours and bundle it with our product in such a way that our product did not touch the customer /lib or /usr/lib. To achieve this (and to avoid the dreaded LD_LIBRARY_PATH), I'm pretty sure our install script binary-edited our executables, replacing their artificially long build PATH with the customer chosen install directory (with our lib sub-directory appended).

    A few random nodes that might give you some ideas to try:

    Sorry I can't be of more assistance just threw this out there in case it gives you some fresh ideas to try.

Re: Perl startup and excessive "stat" use on module load
by thomas895 (Deacon) on Dec 31, 2020 at 06:58 UTC

    In a similar vein to what Fletch suggests, you can re-arrange @INC to have the directory you believe is most likely to contain the module come first. (Perhaps rotate the array?)

    Of course, if the Perl ever changes (by way of an update) or the include path is otherwise changed (someone sets PERL5LIB in a shell script somewhere up the chain) then you'll still incur some extra lookups. As long as you don't remove any @INC entries then at worst it will perform similarly to how it does now.

    -Thomas
    "Excuse me for butting in, but I'm interrupt-driven..."
Re: Perl startup and excessive "stat" use on module load
by NERDVANA (Deacon) on Jan 04, 2021 at 00:39 UTC

    You might see a lot of “stat” but it would be really unusual if that was actually impacting your performance. In most cases, the actual time required for perl to compile those modules would vastly outweigh 12 calls to stat. The one exception to this that I’ve read about is if you use a docker container built from a ton of layers. In that case, each call to stat triggers one stat call for each layer of the filesystem, so 12 stat calls could actually be 1200 or something stupid like that. You can fix that problem by “flattening” the docker filesystem.

    If you want a program that runs fast, write a daemon that stays running. If this is a web app, there are lots of pre-forking options to choose from. (the program loads once, then forks a copy for each worker, then the workload of incoming requests is divided among the workers) See Starman for example.

    If you truly need to start new processes rapid-fire, then the only real way to get performance is to restrict your program to lightweight modules with as few dependencies as possible. For instance, avoid anything that uses Moose or DateTime or several dozen other popular and useful but heavyweight modules. You can of course also reduce the search path as described in other posts here, but I don’t think it’ll make a measurable difference.