Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Refactoring Perl5 with Lua

by rje (Deacon)
on Oct 21, 2014 at 18:31 UTC ( [id://1104594] : perlmeditation . print w/replies, xml ) Need Help??

WARNING: It may be that I'm simply thinking about Parrot in a different way...

If you've read my previous post on microperl, then you're sufficiently prepared to take this post with a grain of salt. As a brief summary, I'll re-quote something Chromatic wrote to start me thinking about this problem in general:

"If I were to implement a language now, I'd write a very minimal core suitable for bootstrapping. ... Think of a handful of ops. Think very low level. (Think something a little higher than the universal Turing machine and the lambda calculus and maybe a little bit more VMmy than a good Forth implementation, and you have it.) If you've come up with something that can replace XS, stop. You're there. Do not continue. That's what you need." (Chromatic, January 2013)

Warning: I've never written a VM or a bytecode interpreter. I have written interpreters and worked with bytecodes before (okay, a 6502 emulator, but that's basically a bytecode interpreter, right?) Just remember that I'm not posting from a position of strength.

So I found the Lua opcode set, and it seems a good starting point for talking about a small, though perhaps not minimal, Turing machine that seems to do much of what Chromatic was thinking about... except for XS, which I still haven't wrapped my head around.

Lua has a register-based 35 opcode VM with flat closures, threads, coroutines, incremental garbage collection... and manages to shoehorn in a tail call, a "for" loop, and a CLOSURE for goodness' sake. And some of those opcodes could be "macros" built on top of other opcodes, rather than atomic opcodes (only if speed were unimportant): SUB, MUL, DIV, POW, LE.

Again, a disclaimer: I haven't been in a compiler construction class for 25 years, and my career has typically been enterprise coding, data analysis, and tool scripting. Regardless, a small opcode set seems to me to be important for portability. And... 35 codes... well, that's dinky.

I don't assume that Lua's codes are sufficient for Perl... things are likely missing or just not quite right for Perl. But I have to start somewhere, right? And I figure some of you have the right Domain Knowledge to shed some light on the subject. Right?

There's lots of neat notes in the aforementioned Lua design doc, written in a clear and concise manner. And now for a brief glance at Lua's opcodes:

MOVE A B R(A) := R(B) LOADK A Bx R(A) := K(Bx) LOADBOOL A B C R(A) := (Bool)B; if (C) PC++ LOADNIL A B R(A) := ... := R(B) := nil GETUPVAL A B R(A) := U[B] GETGLOBAL A Bx R(A) := G[K(Bx)] GETTABLE A B C R(A) := R(B)[RK(C)] SETGLOBAL A Bx G[K(Bx)] := R(A) SETUPVAL A B U[B] := R(A) SETTABLE A B C R(A)[RK(B)] := RK(C) NEWTABLE A B C R(A) := {} (size = B,C) SELF A B C R(A+1) := R(B); R(A) := R(B)[RK(C)] ADD A B C R(A) := RK(B) + RK(C) SUB A B C R(A) := RK(B) - RK(C) MUL A B C R(A) := RK(B) * RK(C) DIV A B C R(A) := RK(B) / RK(C) POW A B C R(A) := RK(B) ^ RK(C) UNM A B R(A) := R(B) NOT A B R(A) := not R(B) CONCAT A B C R(A) := R(B) .. ... .. R(C) JMP sBx PC += sBx EQ A B C if ((RK(B) == RK(C)) ~= A) then PC++ LT A B C if ((RK(B) < RK(C)) ~= A) then PC++ LE A B C if ((RK(B) <= RK(C)) ~= A) then PC++ TEST A B C if (R(B) <=> C) then R(A) := R(B) else PC++ CALL A B C R(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1)) TAILCALL A B C return R(A)(R(A+1), ... ,R(A+B-1)) RETURN A B return R(A), ... ,R(A+B-2) (see note) FORLOOP A sBx R(A)+=R(A+2); if R(A) <?= R(A+1) then PC+= sBx TFORLOOP A C R(A+2), ... ,R(A+2+C) := R(A)(R(A+1), R(A+2)); TFORPREP A sBx if type(R(A)) == table: R(A+1):=R(A),R(A):=next; SETLIST A Bx R(A)[Bx-Bx%FPF+i] := R(A+i), 1 <= i <= Bx%FPF+1 SETLISTO A Bx CLOSE A close stack variables up to R(A) CLOSURE A Bx R(A) := closure(KPROTO[Bx], R(A), ... ,R(A+n))

Interesting note: "Tables" were originally just hashes. Arrays are Tables with integer keys. With Lua 5, the Table is a multi-part structure: a hash part AND an array part, either of which may be empty, but are coordinated. This is for the sake of pure array representation: with an array, no keys are necessarily needed, so sequential values indexed from 0 get thrown into the array. Then when someone puts something ridiculous in the table, like an index of 50_000_000, Lua may elect to put that in the hash half. This is all hidden from the user, though.

"if a table is being used as an array, it performs as an array, as long as its integer keys are dense."

UPDATE: Compare with Lorito's proposed opcode set:

WARNING. I've interpolated how *I* think each opcode might perform its function. Caveat emptor.

Control Flow (5 ops) NOOP do nothing CALL call sub, method, or vtable func END halt interpreter cleanly GOTO IF Value (12 ops) The Value, Math, Comparison and Object ops are designed to minimize op proliferation. Apart from type coercion via coerce_[ins]_[ins], all ops require that each of their arguments be of the same type. I = int N = number S = string COERCE_[T]_[T] A B coerce registers between primitive types (objects will provide the same functionality via vtable functions) LOAD_CONST_[T] A K R(A) := K load a constant SET_[T] A B R(A) := R(B) copy contents of B into A (same type required) Math (14 ops) ADD_[in] A B C R(A) := R(B) + R(C) SUB_[in] A B C R(A) := R(B) - R(C) MUL_[in] A B C R(A) := R(B) * R(C) DIV_[in] A B C R(A) := R(B) / R(C) MOD_[in] A B C R(A) := R(B) % R(C) AND A B C R(A) := R(B) & R(C) OR A B C R(A) := R(B) | R(C) NOT A B C R(A) := R(B) ~ R(C) XOR A B C R(A) := R(B) ^ R(C) Comparison (2 ops) CMP_[in] A B C R(A) = R(B) cmp R(C) (same type required) Objects (5 ops) NEW A R(A) := new PerlThing() create a new o +bj NEW_[ip] A B R(A) := new PerlThing(R(B)) create with in +it GET_ATTR A B C R(A) := R(B).$!(R(C)) get an object attribute's v +alue SET_ATTR A B C R(A).$!(R(B)) := R(C) set an object attribute's v +alue Other (1 op) LOADLIB A load(R(A)) load an HLL(?) library I/O (4 ops, bootstrapping only) These four ops are only intended for use during Lorito's implementatio +n phase. Once Lorito matures to the point where we have a fixed set of ops and have tools that can generate Lorito, we expect these ops to become met +hods on a FileHandle. READ A R(A) := STDIN.nextChar() read from stdin WRITE A STDOUT.write(R(A)) write to stdout SAY A STDOUT.writeln(R(A)) write + "\n" GRIPE A STDERR.write(R(A)) write to stderr

Replies are listed 'Best First'.
Re: Refactoring Perl5 with Lua
by Laurent_R (Canon) on Oct 26, 2014 at 12:13 UTC
    I can't really comment on your project, but you mentioned Parrot, you might also want to take a look at MoarVM and NQP (note quite Perl), two more recent tools used in the development of Perl 6.
      But that has nothing to do with what Chromatic wrote. He has been very critical of MoarVM and NQP to my knowledge.

      Yes. I've read the old Parrot book, including looking through the intermediate language opcodes, which were pretty fun to read about.

      I think the solution is related to that, to some degree. The language has to handle structures that Perl uses -- maybe PMCs, maybe not -- and it ought to be able to register subs or packages or classes, because it will also have to do what XS does today.

      So it's not Lua's opcode set, that's for sure. But it might borrow some ideas from there.

      Heck, it might end up being some Frankensteinian, functional subset of C library functions, plus Perl's functions, plus a way to register new functions with the core. Or something. I don't know. Still mulling it over.