I frequently write code that generates anonymous functions on the fly. However, I often want to verify that these functions are correct without executing them. To this end, I've started writing Test::Code. Here's the start of my test suite (more or less):

BEGIN { use_ok 'Test::Code' or die } ok defined *::is_code{CODE}, '&is_code should be exported to our namespace'; is_code sub { 1 }, sub { 1 }, 'Basic subs should match'; is_code sub { 2 }, sub { 1+1 }, '... even if the exact text is not the same'; is_code sub { print((3 * 4) + shift) }, sub { print 3 * 4 + shift }, '... and parens that do not affect the meaning should work'; ok defined *::isnt_code{CODE}, '&isnt_code should be exported to our namespace'; # How many people would spot the following bug, eh? It's something # I know I've fallen victim to ... isnt_code sub { print (3 * 4) + shift }, sub { print 3 * 4 + shift }, '... and parens that do affect the meaning should fail'; isnt_code sub { print for 1 .. 4 }, sub { for (1 .. 4) { print } }, 'Subtle lexical issues should cause the match to fail (darn it)';

The last example really bugs me. I'd like for that to work, but it doesn't. Also, variables with different names will fail, even if the code is functionally identical. I'm currently using B::Deparse to handle this, but in the long run, I'd really prefer to be able to use PPI::Normal and fail back to B::Deparse.

Right now, this test module is not as useful as I would like due to caveats listed above. Suggestions welcome.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: Test::Code
by diotalevi (Canon) on Aug 11, 2005 at 22:19 UTC
    How are you using B::Deparse? From the sounds of it, B::Deparse is better in the long run than PPI::Normal because you care to compare compiled structures, not whatever PPI is capable of parsing. That is, you want something that will examine the optree and PPI isn't that kind of thing.

    I'd offer up a B::Lisp for serializing optrees to lisp data but I don't feel like posting another module to CPAN. If you'd find it useful, I could post it and then just defer to anyone who actually cared to maintain it.

      PPI::Normal has the intention, in the long run, of normalizing Perl in such a way that functionally equivalent code will present the same DOM tree. Originally PPI::Normal was close to that, but further development of PPI has scaled it back somewhat.

      As for your question about B::Deparse, currently I'm using it like this:

      my $deparse = B::Deparse->new( "-p", # add extra parentheses "-q", # expand double-quoted strings "-sC", # cuddle else/elsif/continue blocks "-x3", # expand syntax constructs );

      Unfortunately, that doesn't handle the case of variables having different names, even if the code is functionally equivalent. Your LISP solution sounds interesting but I wonder how I would present the got/expected failure information? Not too many people are going to want to look at at a LISP equivalent.

      Cheers,
      Ovid

      New address of my CGI Course.

        PPI::Normal has the intention, in the long run, of normalizing Perl in such a way that functionally equivalent code will present the same DOM tree.
        Okay...maybe I'm missing something, but isn't that nigh impossible? Consider these two very simple functions:
        sub alpha { foreach(1..10) { print; } } sub beta { for(my $i=10; $i>=1; $i--) { print 11 - $i; } }
        Both do the exact same thing, albeit in different ways. I'd be interested to see some sort of automated solution that reduces both to the same thing at any level.

        thor

        Feel the white light, the light within
        Be your own disciple, fan the sparks of will
        For all of us waiting, your kingdom will come

        For the variable names, what about trying something similar to B::Deobfuscate? Take the deparsed code and walk through, renaming each variable in turn consistently but abstractly. E.g. if the first variable encountered is "$foo", replace all "[$@%&*]foo" with "${1}var1" everywhere. In otherwords, wherever "foo" is used as a symbol to refer to a variable, replace it with something predictable. So even if the other piece of code uses "bar" instead of foo, as long as the symbol exists in the same semantic place in the deparsed code, it will get replaced similarly by "var1".

        My mind boggles at the regex challenge of doing this sanely on Perl code, so the rest of the solution is left as an exercise for the reader.

        The other thing that occurs is not bothering with B::Deparse but going straight back to B and comparing at the op tree directly. Simon Cozen's has some examples of walking the op tree in Advanced Perl Programming (2nd ed). E.g. (almost straight from the text):

        use B; my $subref = sub { # some subroutine } my $b = B::svref_2object( $subref ); my $op = $b->START; do { print B::class($op) . " : " . $op->name . " (" . $op->desc . ")\n"; } while $op = $op->next and not $op->isa("B::NULL");

        B:: is way over my head, but the notion of walking the two trees and comparing operations directly seems like it might make it easier than worrying about interpreting the deparsed version of the same thing. (Let perl parse Perl.)

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        I have a solution for you but I haven't finished it yet. I wrote Data::Postponed so I could abstract off the symbol renaming part of B::Deobfuscate into something else. The interesting effect of that is you'd end up with a tree of perl syntax with placeholders wherever a symbol name went. Normally you'd just let the values interpolate in but if you wished, you could change the values or just dump out the intermediate structure.

        Consider this. Its how I remember stuff is dumped just from simple debugging. Obviously a more convenient debug output could be provided.

        (. (. (. (. "sub " SUBNAME) " {\n    print ") FOO ) ";\n}")
Re: Test::Code
by adrianh (Chancellor) on Aug 12, 2005 at 10:10 UTC
    However, I often want to verify that these functions are correct without executing them

    I have to admit that I find this intriguing. I'm always more interested in what my functions do than what they look like. Any chance of expanding on why you find this useful?

    Naming niggle: I'd expect is_code to be a test for something being a subroutine, rather than a test for equality. Maybe code_is or code_eq?

      I had the same question. I suspect the examples above are abstracted from a more complex real-world problem. In what situation is this useful? (Code-generating code? Code with major side effects?)

      And for the record, I also think "code_is" or "code_eq" would be better. Or, if you're using B::Deparse, perhaps even "deparse_eq" as that's really the most descriptive of what you're doing (i.e. comparing the parse tree of two separate pieces of code) as opposed to seeing if two code_refs point to the same piece of code or anything along those lines.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      I agree that the name is bad. I do plan to change it. As for why I might not want to execute the code, think about "HOP." If you were to try and write tests for the parser, you might find it quite a bit easier to write eq_code $code, $expected; than to constantly create correct subs to pass into code and make them work.

      A problem can also arise when the code being generated does things that you'd rather not execute. Maybe it starts pulling a value off an iterator but you don't want that to happen yet. Maybe it deletes files, closes a filehandle or does other cleanup work that shouldn't happen yet.

      Cheers,
      Ovid

      New address of my CGI Course.

        If you were to try and write tests for the parser, you might find it quite a bit easier to write eq_code $code, $expected; than to constantly create correct subs to pass into code and make them work.

        What I'd probably do would be to start with tests asking it to parse some strings - and see where that would take me. I've test-firsted parsers in the past like this without too much difficulty. I might try it your way next time to see what it's like :-)

        A problem can also arise when the code being generated does things that you'd rather not execute. Maybe it starts pulling a value off an iterator but you don't want that to happen yet. Maybe it deletes files, closes a filehandle or does other cleanup work that shouldn't happen yet.

        In this case I'd want the dangerous stuff off somewhere I could use mock objects and/or dependency injection to make it safely testable.