in reply to More efficient use of HTML::TokeParser::Simple
Depending on what your input data looks like (and how many elements besides "title" and "h1" you want to handle), this might not do exactly what you want, but I hope it will put you on the right path.use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( $ARGV[0]); my $state = ''; my %content; while ( my $tkn = $p->get_token ) { if ( $tkn->is_start_tag( 'title' )) { $state = "inTitle"; } elsif ( $tkn->is_start_tag( 'h1' )) { $state = "inH1"; } elsif ( $tkn->is_end_tag( 'title' ) or $tkn->is_end_tag( 'h1' )) { $state = ''; } elsif ( $tkn->is_text( ) and $state ) { $content{$state} .= $tkn->as_is; } } print "Title: $content{inTitle}\n"; print "H1: $content{inH1}\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: More efficient use of HTML::TokeParser::Simple
by henka (Novice) on Jul 11, 2006 at 09:21 UTC | |
by wfsp (Abbot) on Jul 11, 2006 at 12:52 UTC | |
|
Re^2: More efficient use of HTML::TokeParser::Simple
by henka (Novice) on Jul 11, 2006 at 06:46 UTC |