in reply to Perl 5 Optimizing Compiler, Part 2

Shrug...

The gist of these ideas seems to be that the Perl5 implementors somehow got things desperately wrong, whereas the JS and/or Python implementors (who of course are doing much the same thing in much the same way) ... didn’t?

All of these systems work by JIT translation of source-code into bytecode or into an internal data-structure that is then processed using an optimized interpreter loop.   Even “traditional” C++ compilers use that technique, and of course, Microsoft’s dot-Net framework is entirely based on it.   The overhead of a runtime interpreter loop is frankly negligible.   JIT compiling is also an affordable one-time overhead expense.

“80% of the time is spent in 20% of the code.”   In many systems, those hot-spots are located in the language-common runtime libraries.   In Perl et al, the hot-spots once identified can be spun off into XS subroutines.   Those hot-spots will be hot-spots no matter how program-control is passed into them, and they (along with the fundamental design of the high-level program) will account for the human-visible performance impact no matter how the other 80% of the code is written.

If this were not so, then Perl, Python, Java, dot-Net, PHP and so-on would never have been done this way, and billions of lines of source-code would not have been developed using them.   Before you pay to get something, be sure you’ll get what you paid for.   In this case, you won’t.

Replies are listed 'Best First'.
Re^2: Perl 5 Optimizing Compiler, Part 2
by chromatic (Archbishop) on Aug 20, 2012 at 16:36 UTC
    Even “traditional” C++ compilers use that technique...

    What?

    JIT compiling is also an affordable one-time overhead expense.

    Not the tracing technique that's currently fashionable! Writing XS to do the same thing without the overhead of op dispatch won't optimize something that's slow because it uses more memory than necessary to provide more flexibility than necessary, especially if running that XS code adds a language barrier that you can't optimize across and which requires serialization and deserialization (or at least prevents you from using non-SVs).

    In Perl et al, the hot-spots once identified can be spun off into XS subroutines.

    In many cases that won't help.

    If this were not so, then Perl, Python, Java, dot-Net, PHP and so-on would never have been done this way....

    I'm sorry, but that's a non sequitur.

      Even “traditional” C++ compilers use that technique...

      What?



      Sundial is being vague again. I *think* he is talking about that all C compilers released in the last 20 have a "front end/back end" design with a non machine code bytecode in the middle. Or Sundial is talking about C++ string classes designed to sell new PCs since they perform a full heap walk/validation on each string catting to stop evil hackers and phishers.

      Not the tracing technique that's currently fashionable! Writing XS to do the same thing without the overhead of op dispatch won't optimize something that's slow because it uses more memory than necessary to provide more flexibility than necessary, especially if running that XS code adds a language barrier that you can't optimize across and which requires serialization and deserialization (or at least prevents you from using non-SVs).

      In Perl et al, the hot-spots once identified can be spun off into XS subroutines.

      In many cases that won't help.

      Turning perl opcodes into one XS always results in a faster runtime even if the data moved as SVs between C functions under 1 XS function. Mark, context and tmps stack swaps, pushmarks, PAD accessing, GV deferencing, wantarray context checks, are eliminated. Machine opcodes sit in RO memory rather than RW memory such as Perl opcodes so the CPU has more opportunity to optimize. In runops the next perl opcode (and next machine code function) can't be predicted since it sits in RW memory or the return register in the callee. For heavy cpu operations such as a regexp, or IO, of course there is no difference between XS and Perl bytecode. Very poorly written XS/C code (macro and inline abuse, and dumb compilers that don't merge character identical branch blocks together with jmps (Visual C cough cough)) can take more memory than the equivalent Perl bytecode. A C compiler can produce jumptables, the Perl Compiler doesn't produce jumptables, although I have seen some actual implementations of jumptables on PerlMonks that didn't result in a tree of conditional opcodes, IIRC it used goto. Perl has no C preprocessor although it has constant folding branch elimination that sort of is the same (although I think I'm the only one in the world who intentionally uses that). Perl encourages strings in general rather than numeric constants for settings and hash key names. I didn't research this, but I dont think Perl has any optimized to an array/AV implementation of restricted hashes (cough cough structs). As other have said, Perl's flexibility is its performance problems. If there is one candidate in Perl's standard library I would rewrite in XS, it is Exporter. 99.999% of Perl Modules use it. A distant 2nd is the pure perl portions of Dynaloader. I dont think there is anything else that deserves a rewrite in XS that would benefit the *whole* Perl community. Introducing new opcodes and reducing the opcode count by stuff more metadata into them (upto a couple bits of a pad offset, if offset is 1111 out of 4 bits, look for it on the "legacy" Perl stack), stuff GV's const char names into the de-gv opcode, not put a mark and a const string SV on the Perl stack. Desktop OS kernels don't allow time slices and inter thread signaling fine enough for automatic parallelism (list assign to list for example), plus perl's magic and tied monkeywrench would cause races. Writing a user mode inter thread synchronization and parallelism system with busy waiting on secondary CPUs is beyond the scope of the Perl project.
        If there is one candidate in Perl's standard library I would rewrite in XS, it is Exporter.

        Common subexpression elimination would help my code more; I'm not that concerned about startup time anywhere but tests.

Re^2: Perl 5 Optimizing Compiler, Part 2
by bulk88 (Priest) on Aug 21, 2012 at 01:51 UTC

    All of these systems work by JIT translation of source-code into bytecode or into an internal data-structure that is then processed using an optimized interpreter loop.   Even “traditional” C++ compilers use that technique, and of course, Microsoft’s dot-Net framework is entirely based on it.   The overhead of a runtime interpreter loop is frankly negligible.   JIT compiling is also an affordable one-time overhead expense.

    Well Perl does do JIT translation already, see eval.