BrianP has asked for the wisdom of the Perl Monks concerning the following question:

Building PerlMagick, I noticed -MTUNE=core2. What a waste!
I fixed it by setting the CFLAGS to -MTUNE=native. OR did I?
Due to CFLAG clobbering, you need to use --with-gcc-arch=native.

The normal Perl (with libperl.a) won't link in PerlMagick.
You have to build it with a shared libperl.so. I have read
that there is roughly a 10%-30% performance penalty for
shared Perl vs static. There is a 4-year-old thread here
about "Building Perl with static and dynamic perllib libs"
but there is no building advice.

How many of today's building/sharing/linking conventions are
hangovers from dos days when a meg of memory was a lot? Is
performing the runtime fire drill of assembling myriad program
fragments, hoping they still fit, so you can have the illusion
of tiny executables an anachronistic sub-optimization?

There are the zillion configure options; threading, which type,
multiplicity, malloc perl/wrap, CFLAGS, CPPFLAGS, "-Ofast
-ffast-math -m64 -march=native -msse4.2 -mavx2 -funroll-loops
-fopenmp -flto", dyno-flags, alignment, dtrace, perlIO, VFork??
GCC vs Intel?

The PerlMagick README has spells to:
o -- create and install the dymamically-loaded version of PerlMagick
o -- replacing your PERL interpreter with PerlMagick statically linked
o -- install a new PERL interpreter with a different name

Can any project with hard-coded, Core2 code optimization place much
value on SPEED?

I would like to have 1 Perl which does it all rather than multiple
versions and it would be desirable to use as many of the slick
features of my new processor as possible. Trying with/without each
of the ~dozen independent variables above and profiling all 4096
possible Perls is beyond affordable.

What is the current Wisdom regarding Building an Efficient
Perl/PerlMagick for a modern system?

  • Comment on Wisdom for building an Efficient Perl/PerlMagick?

Replies are listed 'Best First'.
Re: Wisdom for building an Efficient Perl/PerlMagick?
by BrowserUk (Patriarch) on Mar 16, 2016 at 20:42 UTC
    Is performing the runtime fire drill of assembling myriad program fragments, hoping they still fit, so you can have the illusion of tiny executables an anachronistic sub-optimization?

    If having "the illusion of tiny executables" was the only reason, or a primary reason, (or even a good reason) for using dynamically linked libraries, it probably would be anachronistic; but since it it isn't, it isn't.

    Dynamic linking allows perl to use any one of, and any combination of, thousands of cpan modules that contain one (or more) XS (C/C++/Fortran/whatever) components, without the user having to recompile their perl executable (and resolve all the conflicts, duplicate dependencies and idiocies), every time they want to added a new package to their installation.

    The alternatives to dynamic linking are:

    • Your perl executable is huge, because it is statically linked to the (35/40%?) of the 131,312 packages on CPAN that have XS components.

      Even though you may never use 99% of them.

    • Every user re-compiles their perl executable -- having resolved all the conflicts, dependencies and idiocies -- every time they want to add a new module to their installation.
    • Every user has a few hundred or thousand different perl executables on their system.

      Each statically linked to all of the libraries required by a particular application, script or one liner.

    I have read that there is roughly a 10%-30% performance penalty for shared Perl vs static.

    News: Not everything you read on the internet is true.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BUK, I see your point about statically linking to a
      fraction of a million libraries, each with a
      potentially large number of functions, sub-modules.
      I don't have that much RAM on hand!

      And a 1001 Trick Pony like ImageMagick Convert would
      seem like a good place for sharing.

      For specialized tools on the other end of the spectrum,
      the guaranteed costs of every shared run overwhelm the
      speculative, theoretical benefits by many orders of
      magnitude.

      Only if you can have umpteen programs running
      simultaneously and sharing the code AND if you would
      have crashed out of memory otherwise is there any
      chance of a benefit. Arbitrating over who gets it when
      could cost a lot and means YOU NEED MORE RAM.

      If your kids are squabbling over toys, you need more toys!

      You can just swap out a library and not have to recompile?
      If you rent your code and don't have the source, there
      might be a case. But, that's not the Perl Way. You would
      just download the new tgz, rebuild it from source and be
      sure it worked.

      Time is the one essential item you can't make more of.
      Settling up front for doing 4 runs per day with __SLOWS__
      libraries instead of 5 runs, forever, because you might be
      able to save a build one day sounds like a hard sell.

      And the case of needing all of cpan for one program is
      not something you see regularly. How many 1000+ module
      programs have you written? My record is definitely in the
      double digits.

      But, pulling every one of the 11 CPAN_zees in my
      hg.pl script (plus my bloated, 694kb, everything
      and the kitchen sink for the last 5 years Bpbfct.pm)
      should be easily doable.

      The average .gz size from a small sample is < 2 MB
      each. How big is the average .PM?

      loc .pm | grep "\.pm" | grep 5.22 | wc > 4666 lsr -sA ` loc .pm | grep "\.pm" | grep 5.22` | add 87,175,440
      87MB / 4666 Perl_5.22 *.pm modules -> 18,683 b/.pm
      I wonder how much byte code you would get on
      average from 18k of Perl??
        How many 1000+ module programs have you written?

        None. But that's not the point is it.

        I'm not a big cpan user and I have 486 dlls in the lib and site/lib trees of my default perl installation. Whilst I may never use more than a dozen or so of those in any individual script, I can use [sic] any of the 2.1e1088 combinations of those in any script I write without re-building that perl.

        To achieve that flexibility with a statically linked perl; I'd need to try and resolve all the conflicts between those nearly 500 libraries; and that's a task that might take years; and quite possibly would never be achievable. It's certainly a task I would not even try to take on; and why would I.

        Time is the one essential item you can't make more of. Settling up front for doing 4 runs per day with __SLOWS__ libraries instead of 5 runs, forever, because you might be able to save a build one day sounds like a hard sell.

        That math suggests a 20% speed up from static linking; and that simply isn't true. The cost of dynamic linking is a one-off cost the first time a module is loaded; after that, the cost of calling the functions within it is the same as they would be for statically linked. And the runtime of any program that runs for 5 or 6 hours, isn't going to be dominated by the time taken to load modules.

        I seriously doubt you could demonstrate a 1% difference(*) between statically vs dynamically linked version of any program that runs for more than 5 minutes. (I actually think that's true for any script that runs for more than 1 minute; but let's be conservative.) For a script that runs for 5 hours, I doubt you could reliably measure the difference.

        Arbitrating over who gets it when could cost a lot and means YOU NEED MORE RAM.

        Once again; ram/size is simply not the primary, or even a high priority for the use of dlls. Certainly not for the usage Perl makes of them.

        That said, my currently very lightly loaded system has 51 processes running using 4.1 of my 8GB; and every single one of those executables is making use of anything from 10 to 100+ dlls. If they all statically linked all of the dlls they used, I'd probably need at least 16 if not 32GB to run that same mix. And 51 is not a large number; it is frequently 200+ processes on my system; and on shared servers it can be far, far higher.

        Basically, you're barking up the wrong tree, chasing an optimisation that will almost never provide any benefit whatsoever. I made something of a career of optimising large and long running processes, and static vs dynamic linking has *never* been a factor. Not once.

        Update:* The more I think about this -- and it is just a thought experiment that I have no intention of attempting to verify -- the more I think it likely that the statically linked version make actually take longer to load than the dynamic version; and I'm quite certain that a second concurrent instance would definitely take longer.

        My rational behind that conclusion is this: When the statically linked version is loaded, the OS will need to find a chunk of space large enough for it to fit into. This is no problem in the virtual memory space -- modern processors have 2^63 ~ 9 exabytes of address space for user code -- but in the physical address space there is far less room for maneuver. Whilst the physical to virtual mapping means that there is no need for a contiguous chunk of physical memory; it still requires the building of the page tables to perform the physical to virtual mapping.

        With the dynamically linked version, when the second instance of Perl is loaded, a large percentage of the dlls it requires will already be in memory, their page table entries will already exist and for the most part, no new physical memory pages will need to be allocated and mapped.

        Whereas, (I think) a second instance of a statically linked perl would need to (re)allocate new physical memory for all its code segments despite that copies of all of it already exist in the memory space of the first running instance.

        For one-off, long running processes that's probably insignificant in terms of overall performance; but for repetitive, transient and overlapping instantiations, as with a web server calling perl scripts, the difference might well be significant.

        Pure speculation, but based on good foundations.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Wisdom for building an Efficient Perl/PerlMagick?
by Anonymous Monk on Mar 16, 2016 at 20:05 UTC

    PerlMagick is a glue module for interfacing with ImageMagick. Heavy lifting is done by the latter. Your perl build options should be of little consequence. Check that the ImageMagick library itself is built with optimizations.

    Generally speaking, for best results you'll want link-time optimization and the latest, greatest compilers. (And -march -Ofast, if suitable.) This is especially true for routines that can be autovectorized. For perl, I doubt it matters much. FWIW, the following appears to work:

    cd perl-5.22.1 &&
    ./Configure -des -Duseshrplib -Dusethreads -Duseithreads -Dcc='clang-3.8 -flto' &&
    make install
    

    Finally, if you feel like exploring alternative options, [http www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use|here are] some image processing benchmarks. Though I've no idea which libraries (besides ImageMagick) have perl glue...

      AM,

      >> Your perl build options should be of little consequence

      As glue for IM, the Perl should make little difference. But to change
      the guts of Perl itself by making libperl.so instead of libperl.a,
      everything else I do with Perl will pay the ~20% penalty for being a
      shared object.

      Is there any way that I can use the shared Perl just for PerlMagick,
      use the standard, static Perl for everything else and keep just 1 set
      CPAN modules (~50 bleepin modules I had to rebuild with new machine)

      Or, would I have to do parallel maintenance?

      Third option? Force a static link on PerlMagick and live with a 20MB
      Perl_The_Hutt?

      BORING STATIC VS SHARED TRIVIA SECTION:
      ==========================================
      I have both libperl.a and libperl.so built the same except for sharedness
      Ldd confirms that their guts are identical except for 1 file:

      ldd `find /usr/local/bin -name perl` | psr "s/\(.*//" | freq -s libperl.so => /usr/local/bin/im/lib/perl5/5.22.1/x86_64-linux/CORE/lib +perl.so /usr/local/bin/im/perl: /usr/local/bin/perl:

      The total size too is interesting, .a is 2 MB fatter:
      lsr -s `bloc -s libperl. | grep 5.22` /usr/local/bin/im/lib/perl5/5.22.1/x86_64-linux/CORE/libperl.so 20018 +56 /usr/local/lib/perl5/5.22.1/x86_64-linux/CORE/libperl.a 30494 +58
      The size of the .so version of Perl too looks tiny, on the surface:
      lsr -s `find /usr/local/bin -name perl` /usr/local/bin/im/perl 17400 /usr/local/bin/perl 1887720 add 17400 2001856 -> 2019256; add 2019256 -1887720 -> 131536
      The stump and the shared object together, however, are
      131 THOUSAND BYTES larger than boring old STATIC PERL!

      So, the shameless canard about shared libs saving disk space
      and memory has finally been INESCAPABLY DEBUNKED!
      They take up more memory, more disk space and they extort a
      surcharge against your TIME with every Invocation!

        I believe selecting a particular compiler has more of an effect on perl performance than the issue of shared/static build. Gcc-4.8 seems to produce better executable than either gcc-5.3 or clang (possibly due to match of development toolset). Having done no extensive benchmarking, I cannot offer anything conclusive, however.

        I would advise you to trust the wisdom of perl maintainers (and their configure tools) and the wisdom of perl vendors (OS providers). E.g. the latest slackware-stable ships with shared perl (-Duseshrplib).

        If you believe that a different set of compile options would prove generally superior by a large margin, then this is a claim you need to substantiate. Test it, show it, prove it.