Simple little module that allows you to access the text inside nested tables using a multidimensional array. The html can either be in a variable or from a file.

usage:
my $table = Table->parse_it(\$content); or
my $table = Table->parse_it($filename);

then:
print $table->[$table][$row][$col];

package Table; use strict; use HTML::Parser; ## PRIVATE my $table = []; my $tb_count; my $tb_idx; my $row; my $column; my $table_status; my @save; sub new { my $type = shift; return bless $table, $type; } sub parse_it { my $self = shift; my $src = shift; my $p = HTML::Parser->new( api_version => 3, handlers => [ start => [ \&_start, "tagname"], end => [ \&_end, "tagname"], text => [ \&_text, "dtext"], ], marked_sections => 1, ); if (ref($src)){ $p->parse($$src) or return; }else{ $p->parse_file($src) or return; } return 1; } sub _start { my $tag = shift; if ($tag eq 'table'){ push @save, [$tb_idx, $row, $column]; $row = $column = 0; ++$tb_count; $tb_idx = $tb_count; ++$table_status; } $row++ if ($tag eq 'tr'); $column++ if ($tag eq 'td'); } sub _end { my $tag = shift; if ($tag eq 'table') { ($tb_idx, $row, $column) = @{ pop @save }; --$table_status; } $column = 0 if ($tag eq 'tr'); } sub _text { my $text = shift; $text =~ s/\xa0//; $table->[$tb_idx][$row][$column] .= $text if ($table_status) && ($text !~ m/^\s+$/) && ($text); } return 1;

In reply to Table.pm: Extract text from html tables by zzspectrez

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.