I need to process the following HTML using Web::Scraper, and produce a data structure (see below).
The HTML looks like this:
<h4 class="bla">July 12</h4> <p>Tim</p> <p>Jon</p> <h4 class="bla">July 13</h4> <p>James</p> <p>Eric</p> <p>Jerry</p> <p>Susie</p> <h4 class="date">July 14</h4> <p>Kami</p> <p>Darryl</p>
I would like to create the following data structure (AoH), though any suitable data structure which assicates each name with the proper date would do.
[ { 'July 12' => [ 'Tim', 'Jon' ] }, { 'July 13' => [ 'James', 'Eric', 'Jerry', 'Susie' ] }, { 'July 14' => [ 'Kami', 'Darryl' ] }, ]
I know I can accomplish this with other modules, but I need to be able to do this with the Web::Scraper module, if at all possible.
I am starting off trying to figure out how to do it specifically for one of the dates, July 12. I figured once I get that I'll try to do the same things for all the dates, which is ultimately what I need.
What I've got so far is this:
my $names = scraper { process '//h4[@class="bla" and . = "July 12"]', 'dates[]' => scraper + { process 'p', 'name' => 'TEXT'; }; }
I know my first XPATH is finding the correct h4 tag, but the probelm is that the p tags I need are it's siblings, not it's children/descendents, so the expression 'p' in the nexted scraper construct is not finding any 'p' tags.
My full script looks like this
use strict; use warnings; use Web::Scraper; use Data::Dumper; my $sample = q{ <h4 class="bla">July 12</h4> <p>Tim</p> <p>Jon</p> <h4 class="bla">July 13</h4> <p>James</p> <p>Eric</p> <p>Jerry</p> <p>Susie</p> <h4 class="date">July 14</h4> <p>Kami</p> <p>Darryl</p> }; my $names = scraper { process '//h4[@class="bla" and . = "July 12"]', 'dates[]' => scrap +er { process 'p', 'name' => 'TEXT'; }; }; my $res = $names->scrape( $sample ); print Dumper $res;
That outputs the following:
$VAR1 = { 'dates' => [ {} ] };
Any help with this problem would be appreciated.
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |