I tried with HTML::TreeBuilder, only looking for the cells we are interested in.
(I saved the source to a file for testing)
output (extract)#!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; my $filename = q{html/monk.html}; my $r = HTML::TreeBuilder->new; $r->parse_file($filename); # <td width="48%" valign="top"> my @cells = $r->look_down( _tag => q{td}, width => q{48%}, valign => q{top}, ); my $i; for my $cell (@cells){ my $bold = $cell->look_down(_tag => q{b}); print $bold->as_text, qq{\n}; for my $item ($cell->content_refs_list) { next if ref $$item; print $$item, qq{\n}; } my $link = $cell->look_down( _tag => q{a}, ); print $link->attr(q{href}), qq{\n\n}; last if $i++ > 2; }
Update: Tidied up the outputSERVPRO® of Central Alabama Wilson, David & Christie Phone: (205)678-2224 Fax: (205)678-2226 http://www.servpro.com/franchises/enhanced_asp/default.asp?fn=2196 SERVPRO® of South Alabama Johnson, Walter G. Phone: (251)661-9282 Fax: (251)660-7539 http://www.servpro.com/franchises/enhanced_asp/default.asp?fn=2212 SERVPRO® of Northern Alabama Wilson, David & Christie Phone: (205)678-2224 Fax: (205)678-2226 http://www.servpro.com/franchises/enhanced_asp/default.asp?fn=2233 SERVPRO® of Central Alabama II Wilson, David & Christie Phone: (205)678-2224 Fax: (205)678-2226 http://www.servpro.com/franchises/enhanced_asp/default.asp?fn=2226
In reply to Re: PERL HTML::TableExtractor
by wfsp
in thread PERL HTML::TableExtractor
by jdlev
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |