Questions about Recursion and "Extract your transversal", by chromatic

mascip has asked for the wisdom of the Perl Monks concerning the following question:

I don't manage to comment Improve your extracted transversal on chromatic's blog, so i'll ask my questions and remarks here, hoping to learn through discussion. You won't understand my questions if you don't read the original posts.

First, what would happen if we separated the process in 2 subroutines? For example (non-tested code) :

## Concatenates the texts retrieved in a node's (possibly nested) desc
+endants
# Note: preserve recursion with process_text_in($node)
sub get_all_text_in {
  my $node = shift;

  # Concatenate the texts extracted from each child node
  my $text = reduce { $a .= process_text_in($b) }
                    $node->content_list;
}

## Get the text in a node, processing it if needed
# Note: preserve recursion with get_all_text_in($node)
sub process_text_in {
  my $node = shift;

  # Just text => get it
  return $text unless ref $node;  

  # Not a special tag => get its children texts
  my $tag = $node->tag; 
  return get_all_text_in($node) unless $action{$node->tag};

  # Special tag => process it accordingly
  return $action{$tag}->($node);
}
[download]

I feel like there would be both benefits and drawbacks:

+ separation of concerns: concatenation, text processing
+ simpler logic (no need for if/else, nor $text)

- __SUB__ makes it clear that we are dealing with a recursion
- we lose the value of "having it all in one place" (that's why i added "Note" comments)

Is that right? Did i forget something?

Having it all in one place feels nice and safe, but so does keeping a very simple logic. Maybe that for a recursion, keeping it all in one place is more critical.
My question is: in terms of cycles, what would be the consequence? Would my solution enable us to not need the "undef $traverse"?

Next related question: because both entries (a, p) in the %action hash start with $traverse->($node), it would be possible to take it out of the hash, which would solve one of the weak reference problems i think. (would it?) A good reason not to do that would be if you add some extra-entries in the hash, which wouldn't start by $traverse->($node). I am not familiar with HTML parsing enough, to know if this is likely to happen. Is it?

And finally, if the answer to the last question is yes, then are there other ways than using a hash? What if the hash contained references to named subroutines, which would call get_all_text_in($node)? Would we still have a leak? Sorry i'm not clear enough on how leaks work. What should i read?

Thank you for any insight or discussion.

Comment on Questions about Recursion and "Extract your transversal", by chromatic Select or Download Code

Replies are listed 'Best First'.
Re: Questions about Recursion and "Extract your transversal", by chromatic by Anonymous Monk on Jun 07, 2013 at 00:46 UTC
Sorry i'm not clear enough on how leaks work. What should i read? <best cartoon voice>boy-I-got-a-copy-paste-a-for-a-youu :D Tutorials: Variable Scoping in Perl: the basics, Coping with Scoping , Mini-Tutorial: Perl's Memory Management, Lexical scoping like a fox, Read this if you want to cut your development time in half!, Closure on Closures , perlref#Circular References, Memory leaks and circular references, Circular references and Garbage collection., make perl release memory, about memory management , Re^3: Scope of lexical variables in the main script	[reply]
Re^2: Questions about Recursion and "Extract your transversal", by chromatic by mascip (Pilgrim) on Jun 07, 2013 at 07:56 UTC
Thank you. Now i understand the 2 sources of circularity in the original solution. With my solution they would be gone, as we would be using named subroutines instead of references to a subroutine. Thus, using get_all_text_in() in the hash won't create any leak. I can now add in the list of pros and cons above: + no risk of inducing a memory leak I guess here that personal preferences will differ.Personally i would go for the "named subroutines" solution, because i can spot recursion errors more easily than memory leaks. And because i like simple logic. I am interested to hear people's reasons for preferring one or the other.	[reply]
Re^3: Questions about Recursion and "Extract your transversal", by chromatic by mascip (Pilgrim) on Jun 10, 2013 at 17:13 UTC
In fact, the solution with __SUB__ could use a named subroutine, in which case there's no more memory leak risk. Is that right? We are then left with the pros and cons that i outlined in my first message. And i don't know which i would prefer.	[reply]