in reply to HTML::Parser to extract link text?
I wouldn't use HTML::Parser if I wanted just parts of the document. H::P is nice if you want to iteratively go through an entire page, but maintaining state quickly becomes boring and error prone.
HTML::TreeBuilder, which is based on HTML::Parser, is easier to use.
use strict; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; $tree->parse_file("test.html"); my $content_as_html = sub { join "", map { ref($_) ? $_->as_HTML : $_ } shift->content_list; }; for my $element ($tree->look_down(_tag => "a", href => qr/./)) { my $content = $element->$content_as_html; my $href = $element->attr("href"); $content =~ s/\n//g; print ">> $href, $content\n" }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: HTML::Parser to extract link text?
by isync (Hermit) on Jun 19, 2007 at 21:15 UTC | |
by Juerd (Abbot) on Jun 19, 2007 at 21:23 UTC | |
by isync (Hermit) on Jun 19, 2007 at 21:32 UTC | |
by Juerd (Abbot) on Jun 19, 2007 at 21:50 UTC | |
by Anonymous Monk on Jun 20, 2007 at 08:40 UTC |