Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Im fairly new to Perl and Im trying to grab data from a table from a html file I have on my local harddrive. The file is called 'ops_tran_tracking.html' and the columns headers are as follows 'Account Number', 'Fraud Transfer' , etc(see below in my code). What am I doing wrong? I get no errors and nothing prints on my screen. Noticed I put the carriage return after 'Dispute' and after 'After' in the last column because in the HTML file its on the next line. Below is the HTML source code for the file.
<th class=" Header" scope="col">Account Number</th> <th class="r Header" scope="col">Fraud Transfer<br>Date</th> <th class="r Header" scope="col">Balance</th> <th class="r Header" scope="col">No. of Disputes<br>After Transfer</th +> <th class="r Header" scope="col">Disputed Dollars After<br>Transfer</t +h>
Thanks in advance. Below is the Perl code.
#!/usr/bin/perl use HTML::TableExtract; $te = HTML::TableExtract->new( headers => [qw('Account umber' 'Fr +aud Transfer' 'Date' 'Balance' 'No. of Disputes After Transfer' 'Disputed Dollars After Transfer')] ); $html_string = 'ops_tran_tracking.html'; $te->parse($html_string); # Examine all matching tables foreach $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach $row ($ts->rows) { print join(',', @$row), "\n"; } }

Replies are listed 'Best First'.
Re: extract data using HTML::TableExtract
by pg (Canon) on Oct 09, 2005 at 05:38 UTC

    If you want to parse a file, use parse_file(). parse() is for parsing a html string. Use table_states(), not tables(), which is deprecated.

    use Data::Dumper; use strict; use warnings; use HTML::TableExtract; my $te = HTML::TableExtract->new(headers => ['Name', 'Phone Number']); $te->parse("<table><tr><td>Name</td><td>Phone Number</td></tr>" . "<tr><td>Tom</td><td>1234</td></tr>" . "<tr><td>Mary</td><td>4321</td></tr></table>"); foreach my $ts ($te->table_states) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } }

    This gives:

    Table (0,0): Tom,1234 Mary,4321
      Sorry if I confused anyone, but I need to parse through the actual tables in the HTML file and not the HMTL source code, I just added the source code just so you would understand what the table looks like. Thanks everyone...
Re: extract data using HTML::TableExtract
by johnnywang (Priest) on Oct 09, 2005 at 06:40 UTC
    I think you can not use two columns (Fraud Transfer, Date) to match "Fraud Transfer<br>Date". You don't need to specify the full header, just enough to distinguish them, so you can use "Fraud Transfer" to match the full header "Traud Transfer<br>Date". Similarly for the last two columns.
      and don't forget to use slice_columns => 0. you won't have to use all of your headers that way.
Re: extract data using HTML::TableExtract
by EvanCarroll (Chaplain) on Oct 09, 2005 at 07:30 UTC
    My way probably isn't the best for this task, but I always use HTML::TokeParser::Simple.
    The idiom is:
    while ( my $t = $p->get_token ) { if ( $t->is_start_tag('b') ) { while ( my $t = $p->get_token ) { last if $t->is_end_tag('/b'); if ( $t->is_text )

    The simple while ( get_new_token ) if ( what_i_want ) while ( get_new_token ) last if ( the_end_of_what_i_want ), makes things very easy even if slightly tedious. Very SAXish


    Evan Carroll
    www.EvanCarroll.com
Re: extract data using HTML::TableExtract
by polettix (Vicar) on Oct 09, 2005 at 14:50 UTC
    Your quoting doesn't do what you mean here:
    print "word: [$_]\n" foreach qw( 'Account umber' 'Fraud Transfer' +'Date' 'Balance' 'No. of Disputes After Transfer' 'Disputed Dollars After Transfer') __END__ word: ['Account] word: [umber'] word: ['Fraud] word: [Transfer'] word: ['Date'] word: ['Balance'] word: ['No.] word: [of] word: [Disputes] word: [After] word: [Transfer'] word: ['Disputed] word: [Dollars] word: [After] word: [Transfer']

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
      So if I need to grab the data from the table how would I quote it so that it understand that the column has multiple words. the first column is "Account Number", 2nd column is "Fraud Transfer", etc.... Once again sorry if I confused anyone by using the HTML source code. I need the information thats in the Table and not the HTML source code.

        What frodo72 was trying to say is that your use of quotes and qw() does not accomplish ... never mind i am repeating frodo72. Don't use qw(), but plain old comma separated list ...

        'headers' => [( 'Account Number' , 'Fraud Transfer' , 'Date' , 'Balance' , 'No. of Disputes' , 'Disputed Dollars' )]