in reply to Pulling info out of html pages
Here's the shell of a script that uses HTML::TokeParser to print the data you are looking for.
use strict; use warnings; use HTML::TokeParser; my @sections = qw/ Poster Date /; my $p = HTML::TokeParser->new( 'test.html' ); while ( my $token = $p->get_token ) { my ( $type, $text ) = @$token; if ( $type eq 'C' ) { # we have an HTML comment foreach my $section ( @sections ) { if ( $text =~ /$section/ ) { $p->get_tag( "b" ); my $data = $p->get_trimmed_text; print "$data\n"; last; } } } }
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
|
|---|