in reply to Passing a list as a subroutine's return value ( list of HTML::Element objects )

You are freeing the tree including all of its nodes before you're done with the nodes.

But it turns out you don't need to return the nodes. You just want the stringification of the elements, so stringify them in the sub instead of in the caller.

sub unescape_entities { my $html = shift; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse( decode_entities( $html->as_XML ) ); my @c = map "$_", $tree->findnodes( '//body/*' ); $tree->delete; return @c; }

Better yet (since it properly handles exceptions):

use Sub::ScopeFinalizer qw( scope_finalizer ); sub unescape_entities { my $html = shift; my $tree = HTML::TreeBuilder::XPath->new; my $anchor = scope_finalizer { $tree->delete; }; $tree->parse( decode_entities( $html->as_XML ) ); return map "$_", $tmp->findnodes( '//body/*' ); }

Note: Does passing XML to HTML::TreeBuilder::XPath make sense???

Replies are listed 'Best First'.
Re^2: Passing a list as a subroutine's return value ( list of HTML::Element objects )
by mldvx4 (Friar) on Oct 02, 2023 at 09:54 UTC

    That works well. It also led to a third way. As a result, to put it into a larger context, I now have a working script which can access MySQL tables from a defunct WordPress instance and produce static HTML in a file system hierarchy matching the structure in the original WordPress URLs. That's not generic enough to package, but might get shown to the public later this year.

    Going slightly off-topic, one hard part was that WordPress comments are nested and SQL is not as suited for that as a regular key-value database. So a recursive query was needed. I had to cargo cult that, but creating a matching Perl routine to process the recursive query (once I had the query) was rather easy.

    . . . my $query = qq( with recursive cte (comment_ID, comment_post_ID, comment_author +, comment_parent, comment_date_gmt, comment_type, comment_con +tent) as ( select comment_ID, comment_post_ID, comment_author, comment_parent, comment_date_gmt, comment_type, comment_content from wp_comments where comment_ID = ? AND comment_approved = 1 union all select p.comment_ID, p.comment_post_ID, p.comment_author, p.comment_parent, p.comment_date_gmt, p.comment_type, p.comment_content from wp_comments p inner join cte on p.comment_parent = cte.comment_ID ) SELECT * FROM cte ORDER BY comment_date_gmt; ); my $sth = $dbh->prepare($query); $sth->execute($id); while(my $row = $sth->fetchrow_hashref) { my $cid = $row->{comment_ID}; my $parent_id = $row->{comment_parent}; my $post_id = $row->{comment_post_ID}; if ($parent_id eq 0) { push(@{$posts{$post_id}}, $cid); } $comments{$cid}->{comment_post_ID} = $row->{comment_post_ID}; $comments{$cid}->{comment_parent} = $row->{comment_parent}; $comments{$cid}->{comment_author} = $row->{comment_author}; $comments{$cid}->{comment_date_gmt} = $row->{comment_date_gmt} +; my $content = $row->{comment_content}; $content =~ s|(\s*)\n(\s*)\n|$1<br />\n$2<br />\n|gm; $comments{$cid}->{comment_content} = $content; push (@{$hierarchy{$parent_id}}, $cid); } $sth->finish(); . . .

    The matching subroutine has to be recursive:

    sub render { my ($layer, $k, $comments, $hierarchy) = (@_); my $comment = $hierarchy{$k}; if (!defined($comment)){ return(0); } $layer++; my $ul = HTML::Element->new('ul'); my $count = 0; foreach my $c (@{$comment}) { my $li = HTML::Element->new('li'); $li->attr('id', "comment$c"); my $p1 = HTML::Element->new('p'); $p1->attr('class','author'); $p1->push_content($comments{$c}->{comment_author} ); $li->push_content($p1); my $p2 = HTML::Element->new('p'); $p2->attr('class','date'); $p2->push_content($comments{$c}->{comment_date_gmt}); $li->push_content($p2); my $div = HTML::Element->new('div'); $div->attr('class','words'); $div->push_content($comments{$c}->{comment_content}); $li->push_content($div); $ul->push_content($li); print "."x$layer,$c,"\n" if ($VERBOSE); my $html = &render($layer, $c, $comments, $hierarchy); if ($html) { $ul->push_content($html); $count++; } } return($ul); }

    The stringification now happens higher up in the work flow.

    Thanks, and thanks to all who read and or replied.