in reply to HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?
Here's an example using Ovid's HTML::TokeParser::Simple---The one module you didn't mention in your post ;)
#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple (); use constant SKIP => 0; use constant COPY => 1; die "usage: $0 inputfile > outputfile\n" if @ARGV != 1; my $p = HTML::TokeParser::Simple->new(shift); my @results; my $state = SKIP; while(my $t = $p->get_token) { if ( $state == SKIP && $t->is_start_tag('table') && ( $t->return_a +ttr->{border} =~ /^0$/ && $t->return_attr->{align} =~ /center/ ) ) { $state = COPY; } if ( $state == COPY && $t->is_end_tag('table') ) { $state = SKIP; } elsif($state == COPY) { push @results, $t->as_is; } elsif ( $state == SKIP ) { next; } else { die "I'm confused about my state ($state) at token ".$t->as_is +; } } print "$_\n" for @results;
Thanks to Aristotle for helping me with a similar problem months ago.
--
Allolex
|
|---|