in reply to Pre-position musing on "standalone executables"

Many moons ago Mike Cowlishaw added a feature to his REXX interpreter that the first time a script was run, it attached the compiled bytecode to the end of the source code. Under (the dos-like) OS/2, this was done but placing the bytecode after the ^Z char that delimited the end of the source module. The second and subsequent times the script was run, the interpreter looked for a signature string after the first ^Z in the file and if found, the bytecode was loaded and run without the need to re-interpret the source. The nice thing about the mechanism besides the quicker startup, was that as (dos) text editors stopped reading once they encountered the ^Z in the source, the byte code was automatically thrown away if the script was edited and saved, and the interpreter would recompile it next time the script was run. A nice simple mechanism of keeping the source and compiled versions in synch--and about the only good use I've encountered for the difference between text/binmode files.

It also made it possible to distribute the 'compiled' version without the source, by using a utility that removed everything between (the rexx equivalent of) the shebang line, and the ^Z. This meant that if the file was edited, the bytecode was discarded automatically, and there was nothing left except the shebang line.

Not that it was that hard to reverse engineer the bytecode, but you invariable ended up with something quite different to the original, and without comments or meaningful variable names meaning that it gave about the same degree of protection as distributing Java bytecode.

I wonder if a similar mechanism couldn't be used for perl scripts. When the script is read, if a paricular byte or series of bytes is encountered followed my a magic-style signature sequence, then Perl would just load the bytecode and ignore the source.

In turn, this would allow the construction and distribution of a light-weight Perl binary that didn't carry the weight of the source code parser and associated stuff, leaving just the bytecode interpreter.

This might restrict some language features, eval and the like, but (to me) that seems a small penalty. Then again, I rarely use eval anyway.


Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

  • Comment on Re: Pre-position musing on "standalone executables"

Replies are listed 'Best First'.
Re: Re: Pre-position musing on "standalone executables"
by jryan (Vicar) on Nov 16, 2002 at 00:02 UTC
    I don't think there is any reason to do it like this for perl6, or if its even possible (for instance, what if the script uses a virtual __DATA__ file?). The run-time engine (code-named parrot) is separate from the compiler/bootstrapper, and already has the ability to emit bytecode (in fact, it is central to the design). Compiling your application into bytecode or an executable will be as simple as hitting a switch on the command line.

      After a long thunk, I think I thunked of a solution to the 'problem', though I realise that its only a problem if you see any merit in the idea of keeping the bytecode with the source:)

      Basically, if the functionality of Inline::Files gets cleaned up--I've had some problems with it, but that probably me trying to use it in ways it shouldn't be used--and rolled into the Perl 6 base, all we would need is another predefined section eg. __BYTECODE__, and it could be done. Of course, as John points out above the fact that the ^Z trick wouldn't work on most platforms means that we would need to used a checksum of the source embedded in the bytecode section. It might also be unfreindly to have binary data stuffed on the end of the source code, some editors would freek. That could be addressed by encoding it with some simple, fast transformation like Byte 64 or similar.

      The Parrot engine would simply look for a __BYTECODE__ section and if it found it decode the Byte64; crc or md5 the file (excluding that section), and if it matched the same embedded in the bytecode section, just load and run it.

      I guess what I see as the benefit is avoiding the need for make like tools for ensuring that compiled units match their source code. Seperate files are ok but then you get into the whole game of tracking the seperate elements.


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: Re: Pre-position musing on "standalone executables"
by John M. Dlugosz (Monsignor) on Nov 15, 2002 at 22:56 UTC
    Since the text editor won't be automatically throwing away the compiled version upon a save, you don't want to ignore the text. Instead, check a checksum against one in the compiled part to see if it's still the same.
Re: Pre-position musing on "standalone executables"
by Abigail-II (Bishop) on Nov 19, 2002 at 14:24 UTC
    This isn't going the address the mentioned problems. Sure, you can distribute the bytecode. But that means your program shouldn't use Tk, DBI, Date::Calc, or any other XS module. Many programs do use dynamic libraries in one way or another.

    Abigail

      I agree, being able to capture and store the compiled bytecode would only be a part of the solution. Providing for run-time fix-up of calls to dynamic library routines is another. I can see how this could be done efficiently under Win32 (sorry to pollute your eye-space with that dirty word:), but my knowledge of unix-like systems is insufficient to contribute a solution for them. In the absence of that knowledge, I withheld my ideas.

      Perhaps if you could provide some possibilities for runtime fixup for the Unix systems, and we could find someone with expertise of Mac systems, we could between us offer a complete solution to that part of the puzzle?


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

        There are more than a dozen Unix and Unix like systems, not counting the various versions between them. For each or them, Perl can be build in many different ways (a "smoke" is trying out more than 25 variations, and that's far from all the possibilities). Some builds are binary compatible, but most aren't.

        This is the reason why CPAN doesn't have a binary archive. It will just take too much space. It's a problem that doesn't have a realistic solution, other than the one we already have.

        Abigail