I do not know if there is a CPAN module that interfaces these external tools, but you may be interested in hxpipe that W3C provides. For example,
hxpipe (1) - convert XML to a format easier to parse with Perl or AWKHere's the full list, which includes tools that support selecting elements.
cexport (1) - create headerfile of exported declarations from a C file hxaddid (1) - add ID's to selected elements hxcite (1) - replace bibliographic references by hyperlinks hxcite-mkbib (1) - expand references and create bibliography hxcopy (1) - copy an HTML file while preserving relative links hxcount (1) - count elements and attributes in HTML or XML files hxextract (1) - extract selected elements hxclean (1) - apply heuristics to correct an HTML file hxprune (1) - remove marked elements from an HTML file hxincl (1) - expand included HTML or XML files hxindex (1) - create an alphabetically sorted index hxmkbib (1) - create bibliography from a template hxmultitoc (1) - create a table of contents for a set of HTML files hxname2id - move some ID= or NAME= from A elements to their parents hxnormalize (1) - pretty-print an HTML file hxnum (1) - number section headings in an HTML file hxpipe (1) - convert XML to a format easier to parse with Perl or AWK hxprintlinks (1) - number links & add table of URLs at end of an HTML file hxremove (1) - remove selected elements from an XML file hxtabletrans (1) - transpose an HTML or XHTML table hxtoc (1) - insert a table of contents in an HTML file hxuncdata (1) - replace CDATA sections by character entities hxunent (1) - replace HTML predefined character entities to UTF-8 hxunpipe (1) - convert output of pipe back to XML format hxunxmlns (1) - replace "global names" by XML Namespace prefixes hxwls (1) - list links in an HTML file hxxmlns (1) - replace XML Namespace prefixes by "global names" asc2xml, xml2asc (1) - convert between UTF8 and nnn; entities hxref (1) - generate cross-references hxselect (1) - extract elements that match a (CSS) selectorAnd FWIW, Sphinx also provides HTML stripping. Not sure how you'd use it, but it can be done when ingesting data for indexing.
In reply to Re: Looking for a module that strips an HTML tag and its associated 'TEXT'
by perlfan
in thread Looking for a module that strips an HTML tag and its associated 'TEXT'
by nysus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |