Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Deleting nodes with HTML::TreeBuilder::XPath

by tangent (Parson)
on Jun 20, 2019 at 14:05 UTC ( [id://11101630]=note: print w/replies, xml ) Need Help??


in reply to Deleting nodes with HTML::TreeBuilder::XPath

Are you sure you are selecting the right nodes?
'/html/body//div/ul/li' will select list items, not the list itself.

Given this HTML:

<div> <ul> <li>one</li> <li>two</li> <li>three</li> </ul> <!-- List with 2 empty List Items --> <ul> <li></li> <li>two</li> <li></li> </ul> <!-- Empty List --> <ul> </ul> </div
Run on list items, it does delete empty items:
sub delete_empty_list_item { my $xhtml = HTML::TreeBuilder->new; $xhtml->implicit_tags(1); $xhtml->parse_file($file); for my $list_item ($xhtml->findnodes('/html/body//div/ul/li')) { if ($list_item->is_empty) { print qq(DELETE\n); $list_item->delete(); } } print $xhtml->as_XML_indented; $xhtml->eof; } OUTPUT: <div> <ul> <li>one</li> <li>two</li> <li>three</li> </ul> <ul> <li>two</li> </ul> <ul> </ul> </div>
Run on list elements, it deletes the list itself:
sub delete_empty_list { my $xhtml = HTML::TreeBuilder->new; $xhtml->implicit_tags(1); $xhtml->parse_file($file); for my $list ($xhtml->findnodes('/html/body//div/ul')) { if ($list->is_empty) { print qq(DELETE\n); $list->delete(); } } print $xhtml->as_XML_indented; $xhtml->eof; } OUTPUT: <div> <ul> <li>one</li> <li>two</li> <li>three</li> </ul> <ul> <li></li> <li>two</li> <li></li> </ul> </div>
You could combine the two if you need to delete both empty items and empty lists.

 

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11101630]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-26 08:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found