in reply to Re: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
in thread Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
A particular quote from that article that you cited:
Furthermore, IMHO most apps are not even bound by those limitations, but rather by programmer laziness/sloppiness. If better performance is a low hanging fruit that remains unpicked, JIT is really not going to make much of a difference. Even if it runs 2x faster, it doesn’t mean it'll be 2x cheaper to run. This is the “you are not google” axiom of scalability. Even if people want this performance, for most of them it won’t actually help their business.
I remain unconvinced that an “optimizing compiler” would actually deliver useful results as expected, i.e. enough to justify the project. And here’s why... Consider a program which did 100,000 hash-table insertions followed by 10 million lookups. How much of the time will be spent in the Perl-guts, versus the amount of time spent getting to those guts? I suspect that the performance of this hypothetical application would be determined almost exclusively by the performance of the hash-table code within the “guts,” such that time spent generating the parse-trees and then iterating through them would be negligible. If an edge-case situation were proffered as “justification” for this effort, I would feel obliged as a program manager to require proof that the edge-case was usefully large ... and that there existed a “project in great pain” with a substantial existing source-code base in Perl.
Otherwise, it would be an academic exercise. “See? I did it! (But so what?)”
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by BrowserUk (Patriarch) on Aug 27, 2012 at 23:39 UTC | |
I suspect that the performance of this hypothetical application would be determined almost exclusively by the performance of the hash-table code within the “guts,” such that time spent generating the parse-trees and then iterating through them would be negligible. And I suspect that, once again, you haven't a clue what you are talking about. Have you ever bothered to look into hv.c? Each Perl "opcode" has to deal with a complex variety of different possibilities. With runtime compilation (or jit), it would be possible for 'simple hash' accesses/inserts/updates to bypass all of the myriad checks and balances that are required for the general case, which could yield significant gains in hash heavy code. Ditto for arrays. Ditto for strings. Ditto for numbers. (Do a super search for "use integer" to see some of the possibilities that can yeild.) Then there is the simple fact that perl's subroutines/methods are -- even by interpreter standards -- very slow. (See: 488791 for a few salient facts about Perl's subcall performance.) Much of this stems from the fact that the way the perl sources are structured, C compilers cannot easily optimise across compilation unit boundaries, because they mostly(*) do compile-time optimisations. However, there are a whole class of optimisations that can be done at either link-time or runtime, that would hugely benefit Perl code. (*)MS compiler have the ability to do some link-time optimisations, and it would surprise me greatly if gcc doesn't have similar features. It would also surprise me if these have ever been enabled for teh compilation of Perl. They would need to be specifically tested on so many platforms, it would be very hard to do. But, something like LLVM, can do link-time & runtime optimisations, because it (can) targets not specific processors, but a virtual processor (a "VM") which allows its optimiser to operate in that virtual environment. And only once the VM code has been optimised is it finally translated into processor specific machine code.That means you only need to test each optimiation (to the VM) once; and independently, the translation to each processor. Not the combinatorial product of all optimisations on all processors. What would these gains be worth? It is very hard to say, but if it gave 50% of the difference between (interpreted, non-JITed Java & perl running an recursive algorithm (Ackermann), that does a few simple additions (11 million times): So 83.6 seconds for Perl, and 1.031 seconds for Java! Perl's productivity and (1/2) Java's performance!That would be something worth having for vast array of genomists, physicists, data miners et al. Heck. It might even mean that hacks like mod_perl might become redundant; making a whole bunch of web monkeys happy. Moose might even become usable for interactive applications. Parse::RecDescent might be able to process document in real time rather than geological time. DateTime might be able to calculate deltas as they happen rather than historically. There are three fundamental limitations on an interpreters performance: Whilst Perl is faster than (native code) Python & Ruby, it sucks badly when compared to Java, LUA etc. And the reasons are: In a nutshell, your "suspicions" are so out of touch with reality, and so founded upon little more than supposition, that they are valueless. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by chromatic (Archbishop) on Aug 28, 2012 at 00:00 UTC | |
You're half right. hv.c does demonstrate one of the big problems in optimizing Perl 5, but no amount of static optimization fairy dust magic will help. The biggest problem with regard to SVs and unoptimizability is that the responsibility for determining how to access what's in the SV is in every op. That's why every op that does a read has to check for read magic and every op that does a write has to check for write magic. That's why ops are fat in Perl 5. (That's why ops are fat in Parrot, which still has a design that would make it a much faster VM for Perl 5.6.) Moving magic into SVs from ops would help, as would porting the Perl 5 VM to C++ which can optimize for the type of static dispatch this would enable. LLVM would only really help if you could compile all of Perl 5 and the XS you want for any given program to LLVM IR and let LLVM optimize across the whole program there, but even then you still have to move magic into the SVs themselves (or spend a lot of time and memory tracing types and program flow at runtime) to be able to optimize down to a handful of processor ops. I suspect no one's going to do that for a 10% performance improvement at the cost of 10x memory use. With that said, I must disagree with: Perl's particular brand of preprocessor macro-based, Virtual Machine was innovative and way ahead of its time when it was first written. Not if you look at a good Forth implementation or a decent Smalltalk implementation, both of which you could find back in 1993. Perl's current memory allocator has so many layers to it, that it is neigh impossible to switch in something modern, tried and tested, like the Bohiem allocaotor. I don't see how Boehm would help. The two biggest memory problems I've measured are that everything must be an SV (even a simple value like an integer) and that there's no sense of heap versus stack allocation. Yes, there's the TARG optimization, and that helps a lot, but if you want an order of magnitude speed improvement, you have to avoid allocating memory where you don't need it. Someone decided that rather than use the hardware optimised (and constantly re-optimised with each new generation) hardware stack fr parameter passing, it was a good idea to emulate the (failed) hardware-based, register-renaming architecture of (the now almost obsolete) RISC processors, in software. You're overlooking two things. First, you can't do anything interesting with continuations if you're tied to the hardware stack. Second, several research papers have shown that a good implementation of a register machine (I know the Dis VM for Inferno has a great paper on this, and the Lua 5.0 implementation paper has a small discussion) is faster than the equivalent stack machine. I think there's a paper somewhere about a variant of the JVM which saw a measurable speed improvement by going to a register machine too. (Found it in the bibliography of the Lua 5.0 paper: B. Davis, A. Beatty, K. Casey, D. Gregg, and J. Waldron. The case for virtual register machines.) ... but with all that said, Parrot's lousy calling convention system is not a good example of a register machine. A good register machine lets you go faster by avoiding moving memory around. Parrot's calling conventions move way too much memory around to go fast. | [reply] |
by BrowserUk (Patriarch) on Aug 28, 2012 at 07:46 UTC | |
Up front. When you nay-say the OPs discussion, I, like many others I suspect, read each sentence twice, consider it thrice, and then stay stum. You have the knowledge and experience to contribute to the OPs endeavors, even when you do so with negative energy. You can save the OP from many blind alleys. When sundial "contributes" his 'stop energy'(skip directly to 33:24) there is no knowledge, no experience, nothing but the negative energy of his groundless suppositions. LLVM would only really help if you could compile all of Perl 5 and the XS you want for any given program to LLVM IR and let LLVM optimize across the whole program there, but even then you still have to move magic into the SVs themselves (or spend a lot of time and memory tracing types and program flow at runtime) to be able to optimize down to a handful of processor ops. Are you 100% sure there would be no gains? Just for a minute suspend your disbelief and imagine that all of perl5.x.x.dll/.so was compiled (otherwise unmodified wherever possible) to LLVMs IF. And then when that .dll/.so is linked, all the macros have been expanded and in-lined, all the do{...}while(1) blocks are in-situ; all the external dependencies of all the compile-time scopes are available. Are you 100% certain that under those circumstances, that the link-time optimiser isn't going to find substantial gains from its supra compile-unit view of that code? Now suspend your disbelief a little further and imagine that somone had the energy and time to use LLVMs amazingly flexible, platform-independent, language-independent type system (it can do 33-bit integers or 91 bit floats if you see the need for them), to re-cast Perl's internal struct-based type inheritance mechanism into a concrete type-inheritance hierarchy. What optimisation might it find then? C treats structs as opaque lumps of storage, and has no mechanisms for objects, inheritance or any extensions of its storage-based types. But (for example) C++ has these concepts, and as you say: porting the Perl 5 VM to C++ which can optimize for the type of static dispatch if you could port Perl's type hierarchy to C++, then its compilers should be able to do more by way of optimising them. But porting perl to C++ would be a monumental task because it would require re-writing everything to be proper, standards-compliant, C++. Properly OO with all that entails. LLVM doesn't impose any particular HLL's view of the world on the code. LL stands for low-level. It doesn't impose any particular type mechanism on the code, it will happily allow you to define a virtual machine (VM) that uses 9-bit words and 3-word registers. Isn't it just possible that it might allow the Perl VM to be modeled directly, such that -- with the overview that link-time optimisation has -- it can produce some substantial runtime benefits? And just maybe allow the pick-up-sticks nature of the Perl internals to be cleaned up along the way? And finally, there is the possibility that its JIT capabilities may be able to recognise (at runtime) when a hash(ref) is 'just a hash', and optimise away all the tests for magic, stashes, globs and other variations, and so fast path critical sections of code at runtime. What percentage of Perl's opcode usage actually uses those alternate paths? 10%? 5%? Doesn't that leave a substantial amount of Perl code as potentially JITable to good effect? Whether LLVM JIT is up to the task is a different question -- one that would be answered if we could try it. Not if you look at a good Forth implementation or a decent Smalltalk implementation, both of which you could find back in 1993. I was using Digitalk's SmallTalk/VPM at around that time, and it was dog slow. Forth compilers were making strides using their interlaced opcodes technology (called threaded interpreted code back then, but that has different connuctations these days), but a) those interpreters were in large part handed-coded in assembler; b) you had to write your programs in Forth. Like Haskell, its a different mindset, largely out-of-reach of the sysadmins, shell & casual programmers that Perl targeted. Defining a language that targets a VM defined in (back then) lightweight C pre-processor macros, and throwing it at the C compilers to optimise, was very innovative. The problem is that the many heavy-handed additions, extensions and overzealous "correctness" drives, have turned those once lightweight opcode macros into huge, heavyweight, scope-layered, condition-ridden lumps of unoptimisible boiler-plate. Most of which very few people have ever even taken the time to expand out and look at. Basically, noone really knows what the Perl sources actually look like. Too many heavy-hands on the tiller pulling it every which way as the latest greatest fads come and go, have left us with an opaque morass of nearly untouchable code. (That is in no way to belittle the mighty efforts of the current (and past) maintainers; but rather to acknowledge the enormity of their chosen task!) I don't see how Boehm would help. I'm not sure that it would either, but the main problem is that it would be neigh impossible to try it. Somewhere here (at PM), I documented my attempts to track through the myriad #definiition and redefinitions that make up Perl's memory manager -- it ran to (from memory, literally) hundreds of *alloc/*free names. Impossible to fathom. On Windows, as built by default (and AS; and to my knowledge Strawberry), the allocator that gets used can quite easily (using pretty standard perl code), be flipped into a pathological mode where almost every scalar allocation or reallocation results in a page fault. Documented here Much of my knowledge of using Perl in a memory-efficient manor has come about simply as a result of finding ways to avoid that pathological behaviour. Another big part of the memory problem is the mixing of different allocation sizes within a single heap. Whilst the allocator uses buckets for different sized entities, mixing fixed-sized entities -- scalars, rvs, ints, floats etc. -- and variable sized entities -- strings, AVs etc. -- in the same stack means that you inevitably end up with creeping fragmentation. Imagine an allocator that used different heaps for each fixed-sized allocation; and another two heaps for variable-sized allocations that it flip-flops between when it needs to expand the variable-sized heap. Instead of reallocing in-place, it grabs a fresh chunk of VM from the OS and copies the existing strings over to the new heap and discards the old one thereby automatically reclaiming fragmentation. Don't argue the case here, I've omitted much detail. But the point is that as-is, it is simply too hard to try bolting a different allocator underneath Perl, because what is there is so intertwined. You're overlooking two things. First, you can't do anything interesting with continuations if you're tied to the hardware stack. Are continuations a necessary part of a Perl-targeted VM? Or just a theoretically interesting research topic-du-jour. From my viewpoint, the fundamental issue with the Parrot VM was and is the notion that it should be all things to all men. Every theoretical nice-to-have and every cool research topic of the day, was to be incorporated in order to support the plethora of languages that were going to magically inter-operate atop it. Cool stuff if you have Master's level researchers on research budgets and academia's open-ended time frames to play with. But as a solution to the (original) primary goal of supporting Perl6 ... Second, several research papers have shown that a good implementation of a register machine (I know the Dis VM for Inferno has a great paper on this, and the Lua 5.0 implementation paper has a small discussion) is faster than the equivalent stack machine. Research papers often have a very particular notion of equivalence. Often as not, such comparisons are done using custom interpreters that assume unlimited memory (no garbage collection required), supporting integer-only baby-languages running contrived benchmarks for strictly limited periods on otherwise quiescent machines that are simply switched off when memory starts to exhaust. So unrepresentative of running real languages on real workloads on real-world hardware environments, that their notion of equivalence has to be taken very much in the light of the research they are conducting. Is there a single, major real-world language that uses continuations? Is there a single, real-world, production use VM that emulates a register machine in software? Why have RISC architectures failed to take over the world? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by chromatic (Archbishop) on Aug 28, 2012 at 16:44 UTC | |
by BrowserUk (Patriarch) on Aug 28, 2012 at 18:20 UTC | |
| |
by dave_the_m (Monsignor) on Aug 28, 2012 at 20:21 UTC | |
by BrowserUk (Patriarch) on Aug 28, 2012 at 22:36 UTC | |
| |
by Anonymous Monk on Aug 28, 2012 at 02:17 UTC | |
AutoXS::Accessor - Identify accessors and replace them with XS | [reply] |
by bulk88 (Priest) on Aug 28, 2012 at 10:22 UTC | |
64 bit VC builds have been LTCG from basically day 1 http://perl5.git.perl.org/perl.git/commit/d921a5fbd57e5a5e78de0c6f237dd9ef3d71323c?f=win32/Makefile. A couple months ago I compiled a Perl with ltcg in 32 bit mode, unfortunately I dont remember the VC version, whether it was my 2003 or my 2008. The DLL got slightly (I dont remember how many KB) fatter from inlining, but the inlined functions still existed as separate function calls, and the assembly looked the same everywhere, and I didn't find anything (looking around randomly by hand) that got a non-standard calling convention except what already was static functions. I wrote it off as useless. 2003 vs 2008 for 32bit code might make all the difference though. I decided it wasn't worth writing a patch up for and submitting to P5P to change the makefile. With Will's LLVM proposal, I believe nothing will come of it unless some or all of the pp_ opcode functions, along with runops are rewritten in "not C", or perl opcodes are statically analyzed and converted to native machine data types with SVs being gone. All the "inter procedure optimizations" mentioned in this thread are gone the moment you create a function pointer, it is simply the rules of C and C's ABI on that OS http://msdn.microsoft.com/en-us/library/xbf3tbeh%28v=vs.80%29.aspx. I went searching through perl's pre and post preprocessor headers. I found some interesting things which prove that automatic IPO, on Perl, in C with any compiler is simply impossible. Now in C++, in theory, calling conventions don't exist unless you explicitly force one. The compiler is free to choose how it wants to implement vtables/etc. MS's Visual C for static C functions does do some pretty good "random" calling conventions for 32bit X86 IMHO. For 64 bit X86, Visual C never deviated from the 1 and only calling convention. The question is, are there any compilers daring enough to create a whole DLL/SO which contains exactly 1 function call in C? Not any professional compiler. On some OSes (x64 windows), ABI is enforced through OS parsing of assembly code (x64 MS SEH, technically not true, if you are careful, the OS will never have a reason to parse your ASM). And on some CPUs (SPARC) calling conventions are enforced in hardware. Another danger, there is a fine line between inlining/loop unrolling, and making your L1 and L2 Caches useless. Blindly inlining away all function calls will cause a multi MB object file per Perl script that won't solve anything. | [reply] [d/l] |
by BrowserUk (Patriarch) on Aug 28, 2012 at 11:04 UTC | |
I went searching through perl's pre and post preprocessor headers. I found some interesting things which prove that automatic IPO, on Perl, in C with any compiler is simply impossible. But LLVM isn't a C compiler. It can compile C (amongst many other languages), but it doesn't (have to) follow C conventions. LLVM is a far more an assembler targeting a user definable virtual processor. As an illustration of the sorts of things it can and does do, can you think of any other compiler technology that will generate 832-bit integers as a part of its optimisation pass? You have to stop thinking of LLVM as a C compiler before can even begin to appreciate what it is potentially capable of. It is weird, and to my knowledge unique. In a world where everything -- processors, memory, disk, networking et al. -- are being virtualised; why not virtualise the compiler, have it target a (user configurable) virtual processor, and produce not just platform independence, but processor architecture independence, and source language independence? Can it really tackle a hoarey ol' dynamic language and apply those principles to it successfully? The simple answer is: I do not know. But neither does anyone else! Stop nay-saying based upon your knowledge of what C compilers do, and follow the matra: (Let someone else) Try it!I first installed LLVM here (going by the date on the subdirectory) on the 6th May 2010:
I've been playing with it and reading about it on and off ever since, I still keep learning new things about it all the time. It is unlike anything I've come across before, and defies my attempts at description. Virtual compiler, virtual interpreter, virtual assembler. Take your pick; or all 3. Give it a try, (or at least a read) before you summarily dismiss it it out of hand. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by BrowserUk (Patriarch) on Aug 28, 2012 at 11:44 UTC | |
Sorry for the second reply, but I responded before seeing the stuff below the code block. For 64 bit X86, Visual C never deviated from the 1 and only calling convention. The question is, are there any compilers daring enough to create a whole DLL/SO which contains exactly 1 function call in C? Just this morning I read that the LLVM JIT had until recently, a 16MB limitation on its JIT'ed code size, which is now lifted. Besides which, I don't believe that you need to optimise across function boundaries to get some significant gains (over C compilers) out of the Perl sources. You pointed out that many of perl's opcodes and functions are huge. Much of the problem is not just that they are huge, but also that they are not linear. The macros that generate them are so heavily nested and so frequently introduce new scopes, and unwieldy asserts, that C compilers pretty much give up trying optimise them because they run out of whatever resources they use when optimising. Too many levels of scope is a known inhibitor of optimisers. That's where inlining can help. Will LLVM fare any better? Once again, we won't know for sure unless someone tries it. Another danger, there is a fine line between inlining/loop unrolling, and making your L1 and L2 Caches useless. Blindly inlining away all function calls will cause a multi MB object file per Perl script that won't solve anything. Once again I ask: are you sure? If the JIT can determine that this variable -- hash(ref), array(ref) or scalar -- is not tied, has no magic, and never changes its type -- IV or NV to PV or vice versa -- within a particular loop, then it can throw away huge chunks of conditional code. Similarly for utf/non-utf string manipulations; similarly for all the context stuff for non-threaded code on threaded builds. Note: I say "can", not will. The only way we collectively will know for sure if it will, is to try it. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by locked_user sundialsvc4 (Abbot) on Aug 28, 2012 at 00:44 UTC | |
It is the fundamental nature of a programming language like Perl that the opcodes can be presented with many different situations, as you described. And it knows how to deal with them, so that the programmer does not have to. In order to avoid having the interpreter have to do all of these things, you must introduce strong-typing into the language. Which Perl emphatically does not have. You must restrict the type of parameters that can be passed into a given subroutine, so that the compiler can make the correct determination(s), statically. You must also be able to prove that the operation of the compiler and therefore of the generated code is correct: that your statically-determined checks are both complete and correct; that no other program behavior is possible. I argue that the Perl language does not possess the necessary semantics, and it was purposely designed not to require them. As a language, it is a product of its intended implementation-method; of DWIM and all of that. And I argue that these characteristics impose that implementation method at the exclusion of all others. If you want strong typing, use any one of many languages that provide it. Those languages provide the semantic detail that your compiler will require. Without them, you will find that you can’t do it. The Perl language does not possess them and it never did. And I think that this is what the professor was saying, when he said it would be a good project for an intern where you could always stop at any time and say you won. As I have politely said before, each of us have different core competencies, and language/compiler/interpreters happen to be one of mine. | |
|
Re^3: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by Will_the_Chill (Pilgrim) on Aug 27, 2012 at 21:18 UTC | |
| [reply] |
by bulk88 (Priest) on Aug 28, 2012 at 11:08 UTC | |
And what are the resources you can supply for your idea (time, $, PhDs, patents, etc)? If you are going to do it yourself, just do it. You dont need anyones permission. Perl is open source. Once you have a working prototype with before and after benchmarks, you will start to attract volunteers and corporate users and the rest is history. Not to mention Perl does have hooks for optimizers and "custom" opcodes. You can always JIT an optimizable Perl sub into what perl thinks is XSUB, just change the right fields in the CV struct. If I were you, I would hire a couple programmers with knowledge in compiler design/HLA/interpreter VMs, select a couple Perl subs no more than 1 screenful each off of CPAN, the selection doesn't need to be random, there can be a bias, but it must have enough ";"s, have the programmers try and see if they can compile the Perl subs from A. Human Perl or, B. Bytecode Perl to one or more popular bytecode interpreters, Web Grade Javascript, LLVM, .NET CIL, Java bytecode, really anything. Measure the benchmarks of the before and after. If its faster, great, publish it, you don't need anyones permission. If its slower, try a different VM target. Your real work is how to convert Perl to something closer to the hardware. Anyone can writing Perl interpreter that targets any other programing language that is Turing complete but speed would be much worse than today. I dont know what perlito's current speed is against perl. Someone should research that. Maybe the question you really want to ask is, not whether Perl 5 can use LLVM, but what is the future of LLVM by itself? I will guess the future is excellent, it is OSX's primary compiler. Apple seems to be the primary sponsor and financier of LLVM. It is not going away anytime soon. update: there are other people who have ideas on JIT to machine code in perl http://www.nntp.perl.org/group/perl.xs/2012/07/msg2709.html | [reply] |
by Will_the_Chill (Pilgrim) on Aug 29, 2012 at 21:31 UTC | |
Sorry for the delayed reply, the thread is getting long and twisty! Imagine my edge cases are something like an entire operating system or some high-performance parallel code or an attempt at strong AI. There are many kinds of code that would benefit greatly from general-purpose runtime optimization. Yes, I have such code. I can supply my own time, as well as the time of my modest Perl team. I can supply coordination efforts and put together funding for qualified coders. I can even code. I'm not sure what patents you think we would need? I'm not looking for permission to do this, I'm looking for a WAY to do this. So far my options all seem to require some amount of speculation and collaboration with the most talented coders around. Fun! Yes, the general idea is to put together funding to hire a few programmers and have them target various backends such as LLVM. I think LLVM may have a very bright future. I looked at the stuff you linked from David Mertens about the Tiny C compiler, but I'm not sure how it relates to what we're doing? Thanks, ~ Will | [reply] |
by bulk88 (Priest) on Aug 29, 2012 at 23:41 UTC | |