in reply to Re: run-time syntax checking
in thread run-time syntax checking

Beware of that BEGIN, END, CHECK, INIT, AUTOLOAD, and DESTROY can have a sub keyword in front of them. They can also be prototyped. AUTOLOAD and DESTROY are not relevant for this post though.

However, CHECK and INIT are relevant! Try this one-liner:   perl -wle'BEGIN { require "browserUk.pl" }' where browserUk.pl is your program above, but with __END__ changed to __DATA__ (why did you use __END__ over __DATA__ anyway?), some code in DATA changed, and with a true return value:

use strict; while(<DATA>) { chomp; my $code = $_; tr/\n//d; s[((?:BEGIN|END)\s*{)][sub syntax_check_$1]g; eval 'return;' . $_; print "'$code' \n\t: ", $@ ? "Fails with\n$@" : 'Passes syntax che +ck'; } 1; __DATA__ INIT { print '*** GOTCHA!!! ***'; } # Executes. CHECK { print '*** GOTCHA!!! ***'; } # Executes. sub BEGIN { print '*** GOTCHA!!! ***'; } # Fails to compile. END () { print '*** GOTCHA!!! ***'; } # Executes. my $a = 1; my $a = cool; my $
Another issue would be to take care of use() statements as they're compile-time statements. But if you get rid of use statements, then you might also create compilation errors, since the use() statement might import prototyped subroutines. For instance, perhaps a &cool subroutine prototyped with () was imported in the code above.

And I wouldn't be surprised if there are more related issues...

ihb

Replies are listed 'Best First'.
Re: Re: Re: run-time syntax checking
by BrowserUk (Patriarch) on Feb 01, 2003 at 20:44 UTC

    I knew there were other INIT/BEGIN/END type compile-time subs but couldn't remember what they were, and both perldoc and a grep of the html files failed to turn up were these are documented before I posted. Adding the CHECK (and the others if need be to the regex is trivial. As is handling those same keywords prefixed with sub.

    Dealing with use statements seems to be equally simple. Switch use for no. The statement remains syntactically valid but does not cause any processing of the module. Eg. For the following test I replaced the 1; at the end of POSIX.pm with print 'POSIX.pm processed'.$/;

    C:\test>perl -c use POSIX qw[ceil floor]; print ceil($a), floor($b); POSIX.pm processed no POSIX qw[ceil floor]; print ceil($a), floor($b); ^Z - syntax OK C:\test>

    As you can see, this changed allow the syntax of the statement to be checked without the module it referes to being processed.

    However, the sub defined in and exported from a module with a prototype of () is a problem.

    C:\test>perl -c sub cool(){ print 'cool',$/; } my $a = cool; ^Z - syntax OK

    I can't see a manover around that one other than a adding a restriction to the code that subs must be invoked with either & or (). Depending on what the OP was trying to achieve, that might be acceptable, but as a general facility, would suck a lot.

    With regard to the __DATA__ versus __END__. Dunno, sometimes I use one, sometimes the other. For all 'normal' uses it seems to make no difference.

    On which note:), why did you use perl -e'BEGIN{ require "prog.pl" }' instead of prog.pl or perl prog.pl?

    I thunk and thunk and thunk some more and can't see the logic behind that one:).

    test prog as it currently stands

    Output


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      a grep of the html files failed to turn up were these are documented before I posted

      perlmod - Package Constructors and Destructors

      Dealing with use statements seems to be equally simple. Switch use for no. The statement remains syntactically valid but does not cause any processing of the module.

      Since no is supposed to call method &unimport on package, then it must be loaded. Why your example seem to prove the opposite is that the module is already loaded through the use statement. Make a use again, and you'll see that it's not loaded a second time. Or remove the use statement and you'll see that it indeed is loaded by the no statement if it hasn't been loaded before.

      Changing use to no can also have nasty side-effects. E.g. use 5.006; or use charnames ':short'; print "\N{greek:Sigma}";, and of course, all stricture will be turned off, so the fact that constants are no longer constants but barewords shouln't cause any trouble anyway, same with no vars qw/.../, etc, etc. You get the point.

      With regard to the __DATA__ versus __END__. Dunno, sometimes I use one, sometimes the other. For all 'normal' uses it seems to make no difference

      I occasionally find myself putting data in the module itself, e.g. a Parse::RecDescent grammar. I sometimes use the DATA filehandle for that. I can't use __END__ since that opens a DATA filehandle only if it's in the top-level file. That's why I had to change __END__ to __DATA__ when doing require on your program file. This is documented in perldata.

      why did you use perl -e'BEGIN{ require "prog.pl" }' instead of prog.pl or perl prog.pl?

      If you do perl prog.pl the CHECK and INIT blocks will be defined after top-level compilation, and thus it will be too late to run them. But by doing BEGIN { require 'prog.pl' } I define the CHECK and INIT blocks before top-level run-time, and thus make them execute.

      You've updated your pattern, but it still doesn't cover BEGIN () { }. (Carefully note that the parentheses in the prototype are balanced. The prototype ((){) will compile.) The pattern has some other issues. It'll make subsub BEGIN { } compile, and sub myBEGIN { } not compile, etc. And what about attributes?

      My point with this and the previous post is that it's not as trivial to do this as it might seem. There's a lot of Perl quirks to remember -- or more likely -- forget.

      ihb

        Thanks for the pointer to perlmod. I knew it was there somewhere, I remembered reading it at some point. I just couldn't find it when I wanted it.

        Shame about no. I can see how I fooled myself in the first simple test, but for the life of me I cannot explain why it didn't show up in the output from the test program? That's the full, unedited output. I can only imagine that I rolled back the change from POSIX.pm and continued testing stuff...Oh well. I does lead to a another thought on how to prevent use from having an effect. Basically, instead of modifying the use to no, extract the name of the module and preload a local copy of %INC with the module name. That said, it may well be that in the sort of application that I might envision using this for (see further down for a breif description), the use of any use or require or do or INIT/CHECK/BEGIN/END blocks, would be prohibited anyway. So simply detecting them and rejecting the code outright would be the sane thing to do. A pre-defined set of 'authorised modules' would be pre-loaded by the main script and the author(s) of scriptlets (for want of a better phrase) would need to petition for the inclusion of any additional modules to that list.

        That still wouldn't address the "\N{some:constant}" thing. I have to admit to never even having heard of that until today. I think that the custom translators stuff looks like it might solve a problem I've been having for a while. That's what I love about this site. It always pulls me in new directions and I end up learning a whole bunch of new stuff each time. Ditto Perl. It never stops coming on.

        All pretty academic though, as I still don't see any way around the sub cool () {...} thing.

        __DATA__ .v. __END__. I've never had occasion to embed a __DATA__ section in a module, so I'd encountered that distinction.

        Gotcha on the INIT/CHECK blocks thing, but as the code being syntax-checked would be eval'd at runtime, the "Too late for ..." warning applies right?

        I realise that the pattern I used is still lacking. It would probably need some of Abigail-II's Regex::Common and/or the intreped Mr.Conway's Text::Balanced or maybe Filter::Simple etc. to do the job properly. My main interest in my original post was the use of return; prefixed to the code as a substitute for the eval 'sub {' . $testcode . '}'; that was mentioned and slated elsewhere as a way of preventing the code from being executed.

        The OP didn't make it clear why he wanted to do this, but the typical reason given when this has come up before is that they want to store runtime selectable config information. Something like a manually editable version of the sort of stuff that Data::Dumper throws out.

        Of course, this would be insane in any environment where the code to be eval'd might come from an 'external' source. There is no sense trying to use this kind of mechanism for 'protection'.

        However for "internal-use-only" situations, given that on many platforms, if the user can run the script that would be doing the syntax-checking and evaling, they can also edit it's source, this kind of this kind of approach to "Did they make a typo", has some merit. I think you have proved that there is no possibility of using this mechanism for pre-validating all Perl code. but for the type of use decribed above, and given some sane subset of the language, it may be possible to come up with something that would fit that bill. If everyone throws up their hands and says "It can't be done!", then it never will.

        To say that there is no situation when code derived from outside the source of a script should ever be eval'd doesn't--in my eyes at least-- cut the mustard. Once you accept that there is any situation in which you might allow code read in from external to the script --excluding,for example modules in (hopefully protected and verified source) directories--you have to accept that there is a desire to try an validate that external code to at least some degree, before actually running it. If you don't accept any situation in which external code can be eval'd , you have to the start to question the presence of language constructs like do 'file'; etc. and possibly, even eval itself. Personally, the power that eval lends to the language in some situations is just to useful to consider doing away with, but the (currently missing) ability to perform some level of syntax checking without actually invoking the code, would enhance it, to some degree. In reality, it's entirely possible to write totally nonsense code that will pass perl -c anyway, and the types of errors that are discovered at run-time rather than compile-time are usually considerably more invideous. I still couldn't say that -c is non-useful though.

        For my own purposes, and the source of my interest in the original post, I want and have started to write my own editor. I would like to use power of perl, it's OO-concepts and a variety of other stuff that I can see how to do easily with perl, from within my editor. (No holy wars, but I will never go back to using Teco-like editors or any of it's derivitives!). I would like to be able to write macros, in perl, in the editor, on-the-fly. I would like to be able to isolate any simple typos and stuff before invoking them. A forlorn hope? I don't think so, but a lot of water will flow under a lot of bridges before I am able to substansiate that. Thankyou for your help in moving my ideas a bit closer to their goal.

        <general audience footnote>

        As always, the greatest benefit to me personally, of my interactions with perlmonks, is that in short shrift, through the exposure of my thoughts, ideas and code to the assembled populous, I learn more in a day than I would in a week of reading and trial-and error. I sincerely relish the debate that my half-baked ideas generate, and truely hope that I haven't pissed too many (more?) people off by continuing this one. It seems a shame to me that open, free debate is sometimes seen as something other than that around PM, I think this place would benefit from a lot more of it.

        </general audience footnote>


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.