in reply to Wisdom for building an Efficient Perl/PerlMagick?

Is performing the runtime fire drill of assembling myriad program fragments, hoping they still fit, so you can have the illusion of tiny executables an anachronistic sub-optimization?

If having "the illusion of tiny executables" was the only reason, or a primary reason, (or even a good reason) for using dynamically linked libraries, it probably would be anachronistic; but since it it isn't, it isn't.

Dynamic linking allows perl to use any one of, and any combination of, thousands of cpan modules that contain one (or more) XS (C/C++/Fortran/whatever) components, without the user having to recompile their perl executable (and resolve all the conflicts, duplicate dependencies and idiocies), every time they want to added a new package to their installation.

The alternatives to dynamic linking are:

I have read that there is roughly a 10%-30% performance penalty for shared Perl vs static.

News: Not everything you read on the internet is true.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Wisdom for building an Efficient Perl/PerlMagick?

Replies are listed 'Best First'.
Re^2: Wisdom for building an Efficient Perl/PerlMagick?
by BrianP (Acolyte) on Mar 17, 2016 at 04:23 UTC
    BUK, I see your point about statically linking to a
    fraction of a million libraries, each with a
    potentially large number of functions, sub-modules.
    I don't have that much RAM on hand!

    And a 1001 Trick Pony like ImageMagick Convert would
    seem like a good place for sharing.

    For specialized tools on the other end of the spectrum,
    the guaranteed costs of every shared run overwhelm the
    speculative, theoretical benefits by many orders of
    magnitude.

    Only if you can have umpteen programs running
    simultaneously and sharing the code AND if you would
    have crashed out of memory otherwise is there any
    chance of a benefit. Arbitrating over who gets it when
    could cost a lot and means YOU NEED MORE RAM.

    If your kids are squabbling over toys, you need more toys!

    You can just swap out a library and not have to recompile?
    If you rent your code and don't have the source, there
    might be a case. But, that's not the Perl Way. You would
    just download the new tgz, rebuild it from source and be
    sure it worked.

    Time is the one essential item you can't make more of.
    Settling up front for doing 4 runs per day with __SLOWS__
    libraries instead of 5 runs, forever, because you might be
    able to save a build one day sounds like a hard sell.

    And the case of needing all of cpan for one program is
    not something you see regularly. How many 1000+ module
    programs have you written? My record is definitely in the
    double digits.

    But, pulling every one of the 11 CPAN_zees in my
    hg.pl script (plus my bloated, 694kb, everything
    and the kitchen sink for the last 5 years Bpbfct.pm)
    should be easily doable.

    The average .gz size from a small sample is < 2 MB
    each. How big is the average .PM?

    loc .pm | grep "\.pm" | grep 5.22 | wc > 4666 lsr -sA ` loc .pm | grep "\.pm" | grep 5.22` | add 87,175,440
    87MB / 4666 Perl_5.22 *.pm modules -> 18,683 b/.pm
    I wonder how much byte code you would get on
    average from 18k of Perl??
      How many 1000+ module programs have you written?

      None. But that's not the point is it.

      I'm not a big cpan user and I have 486 dlls in the lib and site/lib trees of my default perl installation. Whilst I may never use more than a dozen or so of those in any individual script, I can use [sic] any of the 2.1e1088 combinations of those in any script I write without re-building that perl.

      To achieve that flexibility with a statically linked perl; I'd need to try and resolve all the conflicts between those nearly 500 libraries; and that's a task that might take years; and quite possibly would never be achievable. It's certainly a task I would not even try to take on; and why would I.

      Time is the one essential item you can't make more of. Settling up front for doing 4 runs per day with __SLOWS__ libraries instead of 5 runs, forever, because you might be able to save a build one day sounds like a hard sell.

      That math suggests a 20% speed up from static linking; and that simply isn't true. The cost of dynamic linking is a one-off cost the first time a module is loaded; after that, the cost of calling the functions within it is the same as they would be for statically linked. And the runtime of any program that runs for 5 or 6 hours, isn't going to be dominated by the time taken to load modules.

      I seriously doubt you could demonstrate a 1% difference(*) between statically vs dynamically linked version of any program that runs for more than 5 minutes. (I actually think that's true for any script that runs for more than 1 minute; but let's be conservative.) For a script that runs for 5 hours, I doubt you could reliably measure the difference.

      Arbitrating over who gets it when could cost a lot and means YOU NEED MORE RAM.

      Once again; ram/size is simply not the primary, or even a high priority for the use of dlls. Certainly not for the usage Perl makes of them.

      That said, my currently very lightly loaded system has 51 processes running using 4.1 of my 8GB; and every single one of those executables is making use of anything from 10 to 100+ dlls. If they all statically linked all of the dlls they used, I'd probably need at least 16 if not 32GB to run that same mix. And 51 is not a large number; it is frequently 200+ processes on my system; and on shared servers it can be far, far higher.

      Basically, you're barking up the wrong tree, chasing an optimisation that will almost never provide any benefit whatsoever. I made something of a career of optimising large and long running processes, and static vs dynamic linking has *never* been a factor. Not once.

      Update:* The more I think about this -- and it is just a thought experiment that I have no intention of attempting to verify -- the more I think it likely that the statically linked version make actually take longer to load than the dynamic version; and I'm quite certain that a second concurrent instance would definitely take longer.

      My rational behind that conclusion is this: When the statically linked version is loaded, the OS will need to find a chunk of space large enough for it to fit into. This is no problem in the virtual memory space -- modern processors have 2^63 ~ 9 exabytes of address space for user code -- but in the physical address space there is far less room for maneuver. Whilst the physical to virtual mapping means that there is no need for a contiguous chunk of physical memory; it still requires the building of the page tables to perform the physical to virtual mapping.

      With the dynamically linked version, when the second instance of Perl is loaded, a large percentage of the dlls it requires will already be in memory, their page table entries will already exist and for the most part, no new physical memory pages will need to be allocated and mapped.

      Whereas, (I think) a second instance of a statically linked perl would need to (re)allocate new physical memory for all its code segments despite that copies of all of it already exist in the memory space of the first running instance.

      For one-off, long running processes that's probably insignificant in terms of overall performance; but for repetitive, transient and overlapping instantiations, as with a web server calling perl scripts, the difference might well be significant.

      Pure speculation, but based on good foundations.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.