Re: Question on extracting HTML tables with HTML::TableExtract

Look down to the bottom of the synopsis in the HTML:TableExtract docs for everything you need. I just ran this

#!/usr/bin/perl -w

use strict;
use warnings;
use HTML::TableExtract qw(tree);
use Data::Dumper;

my $table_string = <<EOF;
<table width="100%" bgcolor="#ffffff">
  <tr>
    <td>Larry &amp; Gloria</td>
    <td>Mountain View</td>
    <td>California</td>
  </tr>
  <tr>
    <td><b>Tom</b></td>
    <td>Boulder</td>
    <td>Colorado</td>
  </tr>
  <tr>
    <td>Nathan &amp; Jenine</td>
    <td>Fort Collins</td>
    <td>Colorado</td>
  </tr>
</table>
EOF


 my $te = HTML::TableExtract->new(keep_html=>1);
 $te->parse($table_string);
 my $table = $te->first_table_found;
 my $table_tree = $table->tree;
my  $table_html = $table_tree->as_HTML;
my  $table_text = $table_tree->as_text;
 
 print Dumper($table_html),"\n";
 print Dumper($table_text),"\n";
[download]

I was too lazy to dig up a table earlier, but ran across one I could paste in while I was reading so I went ahead and tested it. If you change the keep_html=>1 to keep_html=>0 then the as_HTML will strip all the markup except the table tags, and the as_text will strip out all the table tags, too.

Comment on Re: Question on extracting HTML tables with HTML::TableExtract Select or Download Code