datarecall has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

2nd day on perl so please forgive me if I am doing something stupid here. I am trying to get the table with the class ="details" then trying to get the text of every td within that table:
while (my $table = $stream->get_tag("table")){ print $table->[1]{class}; if($table->[1]{class} eq 'details'){ print "HERE"; while(my $row = $table->get_tag("td")){ print $table->get_trimmed_text("/td"); } } }

Replies are listed 'Best First'.
Re: Toke Parser get table
by wfsp (Abbot) on Mar 17, 2009 at 15:35 UTC
    Hi datarecall!

    This assumes there is only one table with class="details".

    #!/usr/bin/perl use warnings; use strict; use HTML::TokeParser; my $html = do{local $/;<DATA>}; my $p = HTML::TokeParser->new(\$html); # find the start of the table while (my $t = $p->get_token){ last if( $t->[0] eq q{S} and $t->[1] eq q{table} and ${$t->[2]}{class}, and ${$t->[2]}{class} eq q{details} ); } my ($in_td); while (my $t = $p->get_token){ # quit loop when we find the end of the table last if( $t->[0] eq q{E} and $t->[1] eq q{table} ); $in_td++, next if $t->[0] eq q{S} and $t->[1] eq q{td}; $in_td--, next if $t->[0] eq q{E} and $t->[1] eq q{td}; next unless $in_td; # if we are here we must be in a td tag if ($t->[0] eq q{T}){ print qq{**$t->[1]**\n}; } } __DATA__ <html> <head> <title>table</title> </head> <body> <table class="details"> <tr> <td>detail 1</td> <td>detail 2</td> </tr> <tr> <td>detail 3</td> <td>detail 4</td> </tr> </table> </body> </html>
    **detail 1** **detail 2** **detail 3** **detail 4**
    (and enclose your code with <c>...</c> tags)

      ${$t->[2]}{class} is weird. You're mixing -> and ${} dereferencing syntaxes. I'd use $t->[2]{class}.

      Also, you incorrectly assumes TD end tags are required. Don't write your own parser! (HTML::TokeParser is a lexer, not a parser.)

      Thanks for the response but why cant you use get tag and then test for the class the way I did it in the first post. Just wondering what I did wrong there.
        Attributes are in ->[2], not ->[1]