Locate large HTML paragraphs with XML::LibXML

Inspired by Re: Extracting paragraphs from html, here's a bit of XML::LibXML code to fetch a web page and dump out all the large paragraphs.

use XML::LibXML;

my $p = XML::LibXML->new;
$p->recover(1);
my $d = do {
  local *STDOUT;
  local *STDERR;
  open STDOUT, ">/dev/null";
  open STDERR, ">/dev/null";
  $p->parse_html_file("http://www.example.com/some/url");
};
for my $p ($d->findnodes(q{//text()[string-length() > 100]})) {
  print $p->toString;
}
[download]

Back to Cool Uses for Perl