Thanks to everyone that posted a solution. I learned a lot by reading thru the different approaches to the problem.
I also ended up working out a solution using nothing but Web::Scraper (one of my requirements), and wanted to post it here
use strict; use warnings; use Web::Scraper; use Data::Dumper; my $sample = q{ <html> <body> <h4 class="bla">July 12</h4> <p>Tim</p> <p>Jon</p> <h4 class="bla">July 13</h4> <p>James</p> <p>Eric</p> <p>Jerry</p> <p>Susie</p> <h4 class="bla">July 14</h4> <p>Kami</p> <p>Darryl</p> </body> </html> }; my $names = scraper { process 'h4.bla', 'names[]' => sub { my $elem = shift; my $date = $elem->as_text; my @names = (); for my $node ($elem->parent->findnodes( "//p[preceding-sibling +::h4[1][. = '$date']]" )) { push @names, $node->as_text; } return { $date => \@names }; }; }; my $res = $names->scrape( $sample ); print Dumper $res
That will output the following
$VAR1 = { 'names' => [ { 'July 12' => [ 'Tim', 'Jon' ] }, { 'July 13' => [ 'James', 'Eric', 'Jerry', 'Susie' ] }, { 'July 14' => [ 'Kami', 'Darryl' ] } ] };
Again, thanks to everyone for the help, you guys are awesome!
In reply to Re: Extracting data-structure from HTML using Web::Scraper
by windowbreaker
in thread Extracting data-structure from HTML using Web::Scraper
by windowbreaker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |