Re: Saving a Pattern Match from Subroutine

I really like using Web::Scraper for that, or rather, its method of using HTML::TreeBuilder::XPath and HTML::Selector::XPath to specify and extract tags:

use strict;
use Web::Scraper;
use Data::Dumper;

my $data = do { local $/; <DATA> };

# Weirdo syntax of Web::Scraper
my $link = scraper {
    process 'a',
        href => '@href',
        text => 'TEXT';
    result 'href', 'text';
};

my $scraper = scraper {
    process 'span.inst a',
        'links[]' => $link;
    result 'links[]';
};

print Dumper $scraper->scrape($data);

__DATA__
<html>
<body>
...
 <span class=inst>
   <a href="file23.html#some_tag">aaa</a>
 </span>
 <span class=inst>
   <a href="file24.html#some_tag">bbb</a>
 </span>
 <span class=no_inst>
   <a href="file24.html#some_tag">(should not match either due to wron
+g span class)</a>
 </span>
 <a href="file23.html#some_tag">a bare link (should not match)</a>
</body>
</html>
[download]

Getting the syntax of Web::Scraper right isn't always straightforward (to me at least), but I hope that some better, non-code based, configurability will come soon.

Comment on Re: Saving a Pattern Match from Subroutine Download Code