Re: HTML Parsing (ick)

This little snippet should get you most of the way:

use HTML::TreeBuilder::XPath;

my $html = q|<div class="listing">
Agave parryi&nbsp;&nbsp;&nbsp;&nbsp;
...
</div>|;

$html =~ s/&nbsp;/ /g;
my $tree = HTML::TreeBuilder::XPath->new_from_content($html);
my @nodes = $tree->findnodes('//div[@class="listing"]');
for my $node (@nodes) {
    my @contents = $node->content_list;
    for my $content (@contents) {
        if (ref $content) {
            my $text = $content->as_text or next;
            my $tag = $content->tag;
            print "<$tag> $text\n\n";
        }
        else {
            print "$content\n\n";
        }
    }
}
[download]

Output:

Agave parryi 

<span> Parry's agave

$20.00 3 quart $12.00 Quart 

<span> Native

Sun to part shade Zones 5-10 Family: 

<i> Amaryllidaceae

From the Southwest comes this lovely agave. Thick spiny leaves adorn t
+his hardy agave. Ultimate clump size is about 36" with each leaf bein
+g maybe 5" across. The flower stalk can reach 12 feet tall. Please pl
+ant in well drained soil in a place where children don't play. 

<span> Hummingbirds
[download]

Comment on Re: HTML Parsing (ick) Select or Download Code

Replies are listed 'Best First'.
Re^2: HTML Parsing (ick) by dbarron (Novice) on Aug 20, 2014 at 11:54 UTC
Ah....excellent, and my thanks Tangent. Now, I'll see if I can get the TreeBuilder module installed under Windows. I couldn't get one of it's dependencies compiled cleanly under Linux, I think maybe Html::entities.	[reply]
Re^3: HTML Parsing (ick) by dbarron (Novice) on Aug 20, 2014 at 13:05 UTC
Ok, after beating the dead horse (cpan) a bit...and manually doing makes and make installs....I have your test program running and ready to expand it! Thanks again.	[reply]