http://qs1969.pair.com?node_id=11138828


in reply to Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element

Apart from adding the text 'HTML::TreeBuilder::XPath' and in places some whitespace (no impact), what are you looking to change? Regardless, Mojo::DOM is my go to for DOM manipulation. See your previous question Data structure question from XML::XPath::XMLParser.

  • Comment on Re: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element

Replies are listed 'Best First'.
Re^2: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
by mldvx4 (Friar) on Nov 15, 2021 at 12:39 UTC

    Thanks haukex and marto. In this case, I just want to trim the unnecessary white space from the start and end of a few elements and attributes. The attributes are easy to work with so that is solved. However, I am not sure how to apply a substitution, s///, to an element containing more that just text.

      However, I am not sure how to apply a substitution, s///, to an element containing more that just text.

      The documentation of HTML::Element's content_refs_list gives an example of how to modify text nodes contained in an element and the documentation of HTML::Element::traverse shows how to use a recursive function to walk the tree. Putting those together:

      sub html_trim { my $elem = shift; for my $itemref ($elem->content_refs_list) { if ( ref $$itemref ) { html_trim($$itemref) } # remove this for non-recursive else { $$itemref =~ s/^\s+|\s+$//g } } } for my $elem ($xhtml->findnodes('//div/ul/li')) { html_trim($elem) }