lucylane has asked for the wisdom of the Perl Monks concerning the following question:

In the HTML::Element::traverse documentation, there's a comment I can't figure out:
Now, if you've been reading Structure and Interpretation of Computer Programs too much, maybe you even want a recursive lambda. Go ahead:
{ my $counter = 'x0000'; my $give_id; $give_id = sub { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { $give_id->($c) if ref $c; # ignore text nodes } }; $give_id->($start_node); undef $give_id; }
It's a bit nutty, and it's still more concise than a call to the traverse method! It is left as an exercise to the reader to figure out how to do the same thing without using a $give_id symbol at all. It is also left as an exercise to the reader to figure out why I undefine $give_id, above; and why I could achieved the same effect with any of:
$give_id = 'I like pie!'; # or... $give_id = []; # or even; $give_id = sub { print "Mmmm pie!\n" };
But not:
$give_id = sub { print "I'm $give_id and I like pie!\n" }; # nor... $give_id = \$give_id; # nor... $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
I've tried test programs to give me an error or other bad output but everything looks the same (w/ and w/o the undef). This is a construct that I like to use in my code and this comment bothers me. (Do others agree it's a little bit nutty btw?)

Thank you for any wisdom.

Replies are listed 'Best First'.
Re: Q on HTML::Element recursive lambda comment
by fizbin (Chaplain) on Mar 08, 2004 at 19:15 UTC
    I agree that the explanatory text is a bit nutty, but there is a reason. It's not so that the code will work correctly, or better, or whatnot. It's so that you won't cause a memory leak if you reference the code again and again.

    As a demonstration, put the above code inside a sub, and then have something that calls that subroutine, say, 16_000_000 times. (Call it on the same very, very small tree - I recommend a tree that's no more than one element in size)

    Now try the same thing without the undef.

    The difference between the two sets of examples he gives is that the bad examples create a reference from $give_id to itself - that is, they create a recursive data structure. Perl's garbage collector doesn't handle recursive structures, so you shouldn't leave them around after a variable goes out of scope.

      Okay, duh, thanks.

      Another question though; here's why I often use the "anonymous lambda" style (dynamic sub): if I do something like this (embedded sub):

      sub traverse { my $start_node = HTML::TreeBuilder->new; $start_node->parse_file(shift); { my $counter = 'x0000'; sub give_id { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { give_id($c) if ref $c; # ignore text nodes } }; give_id($start_node); } }
      The var $counter is captured in the closure and does not get reset even though I'd like it to be reset. It's either this way and moving and resetting $counter by hand outside the block (which I won't do because it's logically inside the function and also easy to forget) or using the dynamic sub and risk a memory leak - how would I get the best of both worlds (uncaptured var + no risk of memory leak). Using a private embedded package or putting everything in it's own package seems wordy.

      Thanks for the enlightenment.

        Well, I'd do it by leaving in the undef statement. Either that, or replace it with something commented:
        { ... $give_id->($start_node); $give_id = 0; # Break circular structure and avoid memory leak }
        That is, just make sure to set $give_id to some value that doesn't depend on $give_id before $give_id goes out of scope.
      For what it's worth, I just tried abusing my laptop with this code:
      use strict; use HTML::Element; sub funny { my $counter = 'x0000'; my $give_id; $give_id = sub { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { $give_id->($c) if ref $c; # ignore text nodes } }; $give_id->($_[0]); undef $give_id; ##### Remove for evil effects ##### } my $a = HTML::Element->new('a', href => 'http://www.perl.com/'); print "Start: ", scalar(localtime), "\n"; for my $i (0..1_000_000) { funny($a); } print "Finish: ", scalar(localtime), "\n";
      With the undef in place, the difference between the "Start" and "Finish" times was 15 seconds. With the undef commented out, the difference between "Start" and "Finish" was only a little over a minute (1:04); however, five minutes after printing the "Finish" line my laptop was still swapping like mad and the shell prompt hadn't yet come back - this indicates that perl was having a devil of a time cleaning things up on exit. (I just gave up and hit Ctrl-C at that point)

      Incidentally, while the version without an undef was running, it took a full 30 seconds for my laptop to display the Ctrl-Alt-Del screen after I gave it the three-finger salute.

      I'm not going to try the 16_000_000 times I suggested above.

Re: Q on HTML::Element recursive lambda comment
by kvale (Monsignor) on Mar 08, 2004 at 19:19 UTC
    First of all, Sean Burke writes eclectic prose and doesn't write code so much as play with code. I love his style :)

    In this bit of the doc, he is rewriting the simple code traversal routine

    { my $counter = 'x0000'; sub give_id { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { give_id($c) if ref $c; # ignore text nodes } }; give_id($start_node); }
    to be more lispish. In particular, he is assigning an anonymous sub to a lexical, rather than just creating a give_id sub. Doing the former promotes the sub to a first class object, i.e., it can be assigned to a variable and passed around the same as any other object.

    Now, you would expect, $giv_id to go out of scope at the end of the block, but it is used in the anonymous sub, so this 'closure' lives on unless you destroy it explicitly. This may or may not be what you want to do, depending on the rest of your code.

    I think the code is perfectly fine to use. It is not very nutty and should pose no problems. But the non-lispy code is a little simpler.

    -Mark

      Note: I'm (as usual) off-topic...

      I really like it when languages allow me to be lispy -- even though I really dislike being lispy in lisp! Lots of insignificant parenthesis -- Go figure! I'd say this ability of Perl (to allow funky functional constructs without the painful restrictions of pure-functional languages) is 70% of my love of the language. (CPAN and the data structure support fill the other 30%).

      I don't know how many times I've wanted to do something with lamba functions and closures in C, and then I kick myself -- doh -- you can't do that! Java does a little better with the anonymous inner classes, but the syntax is horribly cludgy so I can't give it any browny points. I have this one build engine I'm rather proud of that heavily abuses map and anonymous subs. Why? Well, to keep people from messing with my build system -- err, no, because it's insanely powerful. It is, however, amazing how many self-proclaimed OO Gods can't grok functional code. I like it!

      Back on topic: "Left for an exercise for the reader" is bad form, IMHO, in official documentation or a reference manual. It's fine for a student textbook, but it annoys me to no end when a writer thinks they are clever and then won't explain why they think they are clever. Hey, I'm stupid...sometimes I can't figure out the darn exercises! Teach me, or better, give hints.