in reply to Using HTML::Parser extract text from tables

You might want to save the the current table number in @save as well, to be able to handle tables that have both other tables and text inside. To do this, you could apply these changes: At the top:
my $tablenr=0;
In _start:
if ($tag eq 'table'){ push @save, [$tablenr,$row,$column]; $row = $column = 0; $tablenr=$count; ++$count; $in_table++; }
In _end:
if ($tag eq 'table') { ($tablenr, $row, $column) = @{ pop @save }; --$in_table; }
In Text:
$table[$tablenr][$row][$column] .= $text if ($in_table) && ($text !~ m +/^\s+$/);
You might want to add an initialization of @save as well, to avoid trying to der eference an undefined value when you leave a toplevel table. Something like this perhaps:
my @save=([]); #initialize with an empty list as first element.
Regards, GoldClaw

Replies are listed 'Best First'.
Re: Re: Using HTML::Parser extract text from tables
by zzspectrez (Hermit) on Jan 18, 2001 at 12:07 UTC

    Giving thought to the problem of nested subs I rewrote it as a module and fixed the error. See the new module. And the following is a test script using it. Just enter the table,row,col and it will print out the text. type quit when done.

    If anyone else has any suggestions or improvements for the module I would be glad to hear them.

    #!/usr/local/bin/perl -w use strict; use Table; my $table = Table->new; my $content = join '', ( <DATA> ); $table->parse_it(\$content); print "INPUT TABLE,ROW,COL: "; while (my $inp = <STDIN>){ chomp $inp; last if $inp eq 'quit'; my ($x,$y,$z) = split ',', $inp; next unless ($x) && ($y) && ($z); print $table->[$x][$y][$z],"\n" if $table->[$x][$y][$z]; print "INPUT TABLE,ROW,COL: "; } __END__ <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head><title>tester.html</title></head> <body> <h1>tester.html</h1> <TABLE> <TR><TD>TABLE 1:ROW 1:COL1</TD><TD>TABLE 1:ROW1:COL2</TD></TR> <TR><TD><TABLE><TR><TD>TABLE 2:ROW1:COL1</TD></TR></TABLE></TD><TD +>TABLE 1:ROW2:COL2</TD></TR> <TR><TD><TABLE><TR><TD><TABLE><TR><TD>TABLE4:ROW1:COL1</TD><TD>TAB +LE4:ROW1:COL2</TD></TR></TABLE>TABLE3:ROW1:COL1</TD></TR></TABLE></TD +></TR> </TABLE> <TABLE> <TR><TD>TABLE5:ROW1:COL1</TD><TD>TABLE5:ROW1:COL2</TD></TR> <TR><TD>TABLE5:ROW2:COL1</TD></TR> </TABLE> <hr> </body> </html>
Re: Re: Using HTML::Parser extract text from tables
by zzspectrez (Hermit) on Jan 17, 2001 at 06:28 UTC
    You might want to add an initialization of @save as well, to avoid trying to der eference an undefined value when you leave a toplevel table. Something like this perhaps:

    Actually since @save is only being accesed on an end tag that means a begin tag was involved which pushed a value on save. In case of the first table the values 0,0 will pushed on the save and then poped when that table ends. So I dont think I need to wory about dereferncing an undefined value.

    I dont think pushing $tablenr will fix the problem, because when the old value is poped it will think on the next table it should $tablenr++ which will ovewrite the previous data. Have to do something else.

      You are right about that @save and undefined values thing. My brain must have fallen asleep there for a while.

      Pushing tablenr will fix the recursive table though. You still increase $count each time you encounter a table start. You also set $tablenr to $count. Its only on table end that you restore $tablenr, but you do _not_ restore $count. Hence, each table is given a unique, increasing number.

      Have you tried it btw? I haven't. so I'm only speaking theoretically here....

      regards,

      GoldClaw