I don't manage to comment Improve your extracted transversal on chromatic's blog, so i'll ask my questions and remarks here, hoping to learn through discussion. You won't understand my questions if you don't read the original posts.

First, what would happen if we separated the process in 2 subroutines? For example (non-tested code) :

## Concatenates the texts retrieved in a node's (possibly nested) desc +endants # Note: preserve recursion with process_text_in($node) sub get_all_text_in { my $node = shift; # Concatenate the texts extracted from each child node my $text = reduce { $a .= process_text_in($b) } $node->content_list; } ## Get the text in a node, processing it if needed # Note: preserve recursion with get_all_text_in($node) sub process_text_in { my $node = shift; # Just text => get it return $text unless ref $node; # Not a special tag => get its children texts my $tag = $node->tag; return get_all_text_in($node) unless $action{$node->tag}; # Special tag => process it accordingly return $action{$tag}->($node); }
I feel like there would be both benefits and drawbacks: Is that right? Did i forget something?

Having it all in one place feels nice and safe, but so does keeping a very simple logic. Maybe that for a recursion, keeping it all in one place is more critical.
My question is: in terms of cycles, what would be the consequence? Would my solution enable us to not need the "undef $traverse"?

Next related question: because both entries (a, p) in the %action hash start with $traverse->($node), it would be possible to take it out of the hash, which would solve one of the weak reference problems i think. (would it?) A good reason not to do that would be if you add some extra-entries in the hash, which wouldn't start by $traverse->($node). I am not familiar with HTML parsing enough, to know if this is likely to happen. Is it?

And finally, if the answer to the last question is yes, then are there other ways than using a hash? What if the hash contained references to named subroutines, which would call get_all_text_in($node)? Would we still have a leak? Sorry i'm not clear enough on how leaks work. What should i read?

Thank you for any insight or discussion.


In reply to Questions about Recursion and "Extract your transversal", by chromatic by mascip

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.