in reply to Re: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
in thread Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element

Thanks haukex and marto. In this case, I just want to trim the unnecessary white space from the start and end of a few elements and attributes. The attributes are easy to work with so that is solved. However, I am not sure how to apply a substitution, s///, to an element containing more that just text.

  • Comment on Re^2: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
  • Download Code

Replies are listed 'Best First'.
Re^3: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
by haukex (Archbishop) on Nov 15, 2021 at 12:50 UTC
    However, I am not sure how to apply a substitution, s///, to an element containing more that just text.

    The documentation of HTML::Element's content_refs_list gives an example of how to modify text nodes contained in an element and the documentation of HTML::Element::traverse shows how to use a recursive function to walk the tree. Putting those together:

    sub html_trim { my $elem = shift; for my $itemref ($elem->content_refs_list) { if ( ref $$itemref ) { html_trim($$itemref) } # remove this for non-recursive else { $$itemref =~ s/^\s+|\s+$//g } } } for my $elem ($xhtml->findnodes('//div/ul/li')) { html_trim($elem) }