comment on

Thanks to everyone that posted a solution. I learned a lot by reading thru the different approaches to the problem.

I also ended up working out a solution using nothing but Web::Scraper (one of my requirements), and wanted to post it here

use strict;
use warnings;
use Web::Scraper;
use Data::Dumper;

my $sample = q{
<html>
<body>
    <h4 class="bla">July 12</h4>
    <p>Tim</p>
    <p>Jon</p>
    <h4 class="bla">July 13</h4>
    <p>James</p>
    <p>Eric</p>
    <p>Jerry</p>
    <p>Susie</p>
    <h4 class="bla">July 14</h4>
    <p>Kami</p>
    <p>Darryl</p>
</body>
</html>
};

my $names = scraper {
    process 'h4.bla', 'names[]' => sub {
        my $elem = shift;
        my $date = $elem->as_text;
        my @names = ();
        for my $node ($elem->parent->findnodes( "//p[preceding-sibling
+::h4[1][. = '$date']]" )) {
            push @names, $node->as_text;
        }
        return { $date => \@names };
    };
};

my $res = $names->scrape( $sample );
print Dumper $res
[download]

That will output the following

$VAR1 = {
          'names' => [
                       {
                         'July 12' => [
                                        'Tim',
                                        'Jon'
                                      ]
                       },
                       {
                         'July 13' => [
                                        'James',
                                        'Eric',
                                        'Jerry',
                                        'Susie'
                                      ]
                       },
                       {
                         'July 14' => [
                                        'Kami',
                                        'Darryl'
                                      ]
                       }
                     ]
        };
[download]

Again, thanks to everyone for the help, you guys are awesome!

In reply to Re: Extracting data-structure from HTML using Web::Scraper by windowbreaker
in thread Extracting data-structure from HTML using Web::Scraper by windowbreaker

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.