zli034 has asked for the wisdom of the Perl Monks concerning the following question:

I have some compiled codes, very simple and very similar to C language. I want to reverse them to source codes to study the mechanism behind the machine codes. The API of this simple language is largely available to everyone doing coding with it. Is there any readings can demonstrate construction of a decompiler from scratch? Parsing files, this should be a nice job for Perl. Helps appreciated.

Replies are listed 'Best First'.
Re: Question of Reverse Engineering
by GrandFather (Saint) on Jun 30, 2007 at 22:46 UTC

    The Wikipedia entry for Decompiler gives a good overview including a nice section on Legality.


    DWIM is Perl's answer to Gödel
Re: Question of Reverse Engineering
by garu (Scribe) on Jul 01, 2007 at 06:39 UTC

    Well, (assuming it's your own code and you're legally allowed to reverse engineer the output) depending on your language's compiler instructions, the generated binary code might not be as simple as you think or hope: even a pseudo-non-existant-language with only a '+' operator and no flow or loop instructions whatsoever could still create really messy executables, specially if it has optimizing compiling options and if the languages support multi-length variables and numbers.

    As you pointed out yourself, parsing is indeed a nice job for Perl and there are a lot of available modules to help you out. You should probably take a look at some parsers such as Parse::RecDescent or Parse::Earley, and modules like Disassemble::X86 (if the binary code was generated for a x86 computer) and Win32::Exe (if it's a 32-bits MS-Windows box).

    Hope this helps!

Re: Question of Reverse Engineering
by zli034 (Monk) on Jul 01, 2007 at 20:59 UTC
    Did some reading. Reverse engineering isn't that easy. I still want to figure it out. I am currently doing coding with this compiler. I can just compile a main () function. Can decompile the code by just compare the compiled and source code? This language only contain basic variable types and list of function call, and if, while for return... statements.