in reply to embedded table remover

Perhaps a more robust (and shorter) solution can be created on top of HTML::Table, part of LWP. Amazing how much reinvention happens (creating more fragile solutions) when you don't check the CPAN first. :)

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
Re: RE: embedded table remover
by salvadors (Pilgrim) on Dec 31, 2000 at 05:41 UTC
    HTML::Table is used for creating tables, rather than reading them. I suspect you meant HTML::TableExtract?

    Again, however, I suspect that that won't really work either as it discards all information that it doesn't need.

    You probably just want to build a handler onto HTML::Parser:

    #!/usr/bin/perl -w use strict; use HTML::Parser; my $in_table = 0; my $p = HTML::Parser->new( default_h => [ sub { print shift unless $in_table }, 'text'], start_h => [ sub { shift eq 'table' ? $in_table++ : $in_table || print shift }, 'tagname, text'], end_h => [ sub { shift eq 'table' ? $in_table-- : $in_table || print shift }, 'tagname, text'], ); $p->parse_file(shift || die "Need a file") || die $!;

    Tony

RE: RE: embedded table remover
by BigJoe (Curate) on May 27, 2000 at 06:46 UTC
    I read up on that and really didn't understand it. It showed how to access the data but I wanted to just remove all the embedded tables.