Intercepting compile time blocks like BEGIN {}

LanX has asked for the wisdom of the Perl Monks concerning the following question:

I'm still in shock about this post Re: Vulnerabilities when editing untrusted code... (Komodo), showing that it's not trivial to find BEGIN, CHECK und UNITCHECK compile time blocks (furtheron named "CTBs") by static parsing before the code is executed.

My understanding of whats happening in ''=~('(?{B'.'EGIN{print "owned"}})') is:

in a regex (?{..}) is treated like an eval-block.

eval-blocks also allow CTBs

the parser/lexer optimizes the concatenation of literal strings within the regex away.

While compiling the BEGIN-Block is executed.

After some meditation I think, that having a mechanism to intercept the execution of CTBs is a necessary feature request.

It would be beneficial to have something like a command line switch to make perl print all CTBs instead of evaling them and not continuing by default.

something like perl -cc maybe extendable by hooking a function to treat the code string perl -cc='my ($code, $phase, $file, $line); print $code; 0'.

The return code of those callbacks could be taken to decide about the further continuation of the process. (e.g. based on a file's ownership, path or certificate) ˛

AFAIK CTBs are evaledą, so in theory it should be easily possible to intercept the evaling routine to do this.

The possible benefits are:

Debugging of CTBsł

automated testing if code can be safely syntax checked without executing code

I took a look into Safe, but it doesn't seem that this case is covered ... or is it possible to hook into eval to achieve this?

Or is there already any other possibility I missed???

Cheers Rolf

1) well not quite ... from perlmod

       It should be noted that "BEGIN" and "UNITCHECK" code blocks are
       executed inside string "eval()"’s.  The "CHECK" and "INIT" code
+ blocks
       are not executed inside a string eval, which e.g. can be a prob
+lem in a
       mod_perl environment.
[download]

2) it could also return other code to be used instead, e.g. to wrap the given code into "use Safe;" and "no Safe;" statements,

3) including tracing and investigating CTBs of alien code.

Comment on Intercepting compile time blocks like BEGIN {} Select or Download Code

Replies are listed 'Best First'.
Re: Intercepting compile time blocks like BEGIN {} by ikegami (Patriarch) on Aug 09, 2010 at 14:34 UTC
automated testing if code can be safely syntax checked without executing code Actually, disabling BEGIN blocks would greatly reduce the value of s syntax check. For example, it would introduce errors due to missing imports or missing prototypes remove warnings and errors from warnings and strict unless you conditionally allow the use of certain modules remove warnings relating to imports add warnings due to missing globals Also, it would prevent syntax checking a module as that requires executing `require`. EPIC uses PPI to parse the script without executing anything. It does a great job of finding errors reliably. Anyway, I don't see the problem. If you've installed the module, you've already accepted its evilness. I don't see what good a syntax check of an untrusted module would do. Just like you wouldn't execute it, don't do a syntax check on it.	[reply] [d/l]
Re^2: Intercepting compile time blocks like BEGIN {} by LanX (Saint) on Aug 09, 2010 at 21:24 UTC
> Actually, disabling BEGIN blocks would greatly reduce the value of s syntax check. For example, it would > ... I'm aware of this, but thats exactly why I was describing a call-back function to control the process. For instance the filepath could be taken to make a distinction between trusted and new code. And rurban's suggestion to wrap the code into a Safe environment could be chosen to allow execution of BEGIN blocks in untrusted code. > EPIC uses PPI to parse the script without executing anything. Tell me, PPI can find BEGIN-Blocks like in ''=~('(?{B'.'EGIN{print "owned"}})') ? AFAIK PPI can not deal with all kinds of syntax changing mechanisms. So wouldn't be of much help when searching for evil code, since attackers could use these limitations. Cheers Rolf	[reply]
Re^3: Intercepting compile time blocks like BEGIN {} by ikegami (Patriarch) on Aug 09, 2010 at 22:00 UTC
For instance the filepath could be taken to make a distinction between trusted and new code. How does that help? Three of the four examples I gave still stand, and you still can't syntax check a module. And rurban's suggestion to wrap the code into a Safe environment Safe is considered not safe. Tell me, PPI can find BEGIN-Blocks like in ''=~('(?{B'.'EGIN{print "owned"}})') ? It shows as a regex literal, which sounds good to me. So wouldn't be of much help when searching for evil code, since attackers could use these limitations. Using PPI removes the need to detect such attacks. The only reason you need to detect the attacks is that your method is susceptible to them.	[reply]
Re^4: Intercepting compile time blocks like BEGIN {} by LanX (Saint) on Aug 09, 2010 at 22:04 UTC
Re^5: Intercepting compile time blocks like BEGIN {} by ikegami (Patriarch) on Aug 09, 2010 at 23:13 UTC
Some notes below your chosen depth have not been shown here
Re: Intercepting compile time blocks like BEGIN {} by locked_user sundialsvc4 (Abbot) on Aug 09, 2010 at 13:16 UTC
“Sometimes, cleverness is not a virtue.” Sometimes, the products of “cleverness” prove to be quite uncontrollable. In my humble opinion, `BEGIN` blocks are one of those things. And, if we then try to “intercept” them, so as to prevent them from doing what we don’t want them to do in this-case or that, “well, we have only made matters worse, haven’t we?” I prize one characteristic of good source code above all others: clarity. In such code, I am able to quickly read the code and to ascertain, with a very high degree of confidence, that I actually know what it is actually telling the computer to do, and that the computer will actually interpret it in just that way. This idea of “intercepts” would, IMHO, unfortunately just serve to make the code even more inscrutable than it already may be. Of course I do not mean the foregoing to be “a blanket statement, true in every case as though it were inscribed by a divine hand in some stone tablets.” Instead, call it a rule-of-thumb, offered by a thumb that has been whacked with a hammer too many times.
Re^2: Intercepting compile time blocks like BEGIN {} by LanX (Saint) on Aug 09, 2010 at 13:39 UTC
BEGIN blocks are a crucial part of the `use` mechanism and responsible for much of the flexibility many CPAN moduls can offer. IMHO other so called "clear" languages/products just offer a multitude of specialized mechanisms which aren't really better controllable when using foreign libraries. I doubt that those mechanisms are better suited, because normally they diminish flexibility without really enforcing security. What's needed is a mechanism to define and enforce the personal level of trust, thats why I want to be able to hook a call-back into the executions at compile-time. Perl's Debugger already has many possibilities to hook call-backs into various aspects and phases of execution, it would only complete this set of possibilities for debugging and introspection. Cheers Rolf UPDATE: > This idea of “intercepts” would, IMHO, unfortunately just serve to make the code even more inscrutable than it already may be. Which code are you talking about? I was talking about a command line switch, not of an extension of the Perl syntax. There is no `use intercept` intended!	[reply] [d/l] [select]
Re: Intercepting compile time blocks like BEGIN {} by ikegami (Patriarch) on Aug 09, 2010 at 23:53 UTC
AFAIK CTBs are evaled, so in theory it should be easily possible to intercept the evaling routine to do this. eval() isn't used. But something is. Does it really matter what that something is? well not quite ... from perlmod It refers to: `$ perl -E'eval "BEGIN { say q{foo} }"' foo $ perl -E'eval "UNITCHECK { say q{foo} }"' foo $ perl -E'eval "CHECK { say q{foo} }"' $ perl -E'eval "INIT { say q{foo} }"' $` [download] But it's not completely true. The problem is that `eval` would normally be used after the CHECK and INIT blocks have triggered. If you use `eval` earlier, all four blocks work. `$ perl -E'BEGIN { eval "CHECK { say q{foo} }" }' foo $ perl -E'BEGIN { eval "INIT { say q{foo} }" }' foo` [download]	[reply] [d/l] [select]
Re^2: Intercepting compile time blocks like BEGIN {} by LanX (Saint) on Aug 10, 2010 at 00:03 UTC
> eval() isn't used. But something is. Does it really matter what that something is? theoretically no, practically yes, because extending an eval mechanism shouldn't be difficult "something" OTOH could mean anything much more complicated. Cheers Rolf	[reply]
Re^3: Intercepting compile time blocks like BEGIN {} by ikegami (Patriarch) on Aug 10, 2010 at 01:06 UTC
Quite the opposite. `eval` is an op. It's meant to be called from Perl land. This may subject it to limitations and make it a poor choice. Limiting yourself to a specific implementation (without knowing anything about it) is definitely not better.	[reply] [d/l]
Re: Intercepting compile time blocks like BEGIN {} by Anonymous Monk on Aug 09, 2010 at 14:11 UTC
Might the safe module help in this case? Maybe you could allow init to run, as long as just does init in memory... If it keeps its handles to itself and doesn't touch the disk/network, is that ok?	[reply]
Re^2: Intercepting compile time blocks like BEGIN {} by LanX (Saint) on Aug 09, 2010 at 14:26 UTC
Well that's one of my questions. :) As I said "I took a look into Safe" but couldn't figure out how to achieve this goal. Safe seams mainly to intercept dedicated opcodes, are there special opcodes for CTBs? AFAIK opcodes are executed after compilation (to opcodes) so it should be already to late. Cheers Rolf	[reply]