in reply to Extracting paragraphs from html
Use XML::LibXML in HTML-parsing mode, then use an XPath that looks for text() nodes that have a length greater than N.
update: See Locate large HTML paragraphs with XML::LibXML.
-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.
update: See Locate large HTML paragraphs with XML::LibXML.
|
---|
In Section
Seekers of Perl Wisdom