in reply to Toke Parser get table

Hi datarecall!

This assumes there is only one table with class="details".

#!/usr/bin/perl use warnings; use strict; use HTML::TokeParser; my $html = do{local $/;<DATA>}; my $p = HTML::TokeParser->new(\$html); # find the start of the table while (my $t = $p->get_token){ last if( $t->[0] eq q{S} and $t->[1] eq q{table} and ${$t->[2]}{class}, and ${$t->[2]}{class} eq q{details} ); } my ($in_td); while (my $t = $p->get_token){ # quit loop when we find the end of the table last if( $t->[0] eq q{E} and $t->[1] eq q{table} ); $in_td++, next if $t->[0] eq q{S} and $t->[1] eq q{td}; $in_td--, next if $t->[0] eq q{E} and $t->[1] eq q{td}; next unless $in_td; # if we are here we must be in a td tag if ($t->[0] eq q{T}){ print qq{**$t->[1]**\n}; } } __DATA__ <html> <head> <title>table</title> </head> <body> <table class="details"> <tr> <td>detail 1</td> <td>detail 2</td> </tr> <tr> <td>detail 3</td> <td>detail 4</td> </tr> </table> </body> </html>
**detail 1** **detail 2** **detail 3** **detail 4**
(and enclose your code with <c>...</c> tags)

Replies are listed 'Best First'.
Re^2: Toke Parser get table
by ikegami (Patriarch) on Mar 17, 2009 at 20:19 UTC

    ${$t->[2]}{class} is weird. You're mixing -> and ${} dereferencing syntaxes. I'd use $t->[2]{class}.

    Also, you incorrectly assumes TD end tags are required. Don't write your own parser! (HTML::TokeParser is a lexer, not a parser.)

Re^2: Toke Parser get table
by datarecall (Initiate) on Mar 17, 2009 at 19:51 UTC
    Thanks for the response but why cant you use get tag and then test for the class the way I did it in the first post. Just wondering what I did wrong there.
      Attributes are in ->[2], not ->[1]
        I changed it from 1 -> 2 however it is still not showing the class of the table. I appreciate all the help. c $te = HTML::TableExtract->new(); $te->parse($mech->content); print "FUCK"; while (my $table = $stream->get_tag("table")){ print $table->2{class}; if($table->2{class} eq 'details'){ print "HERE"; while(my $row = $table->get_tag("td")){ print $table->get_trimmed_text("/td"); } } } /c