Mr Bigglesworth has asked for the wisdom of the Perl Monks concerning the following question:
Hi All
I am trying to get data out of some HTML tables, which is largely working. i am having 2 issues and I have no idea how to resolve, and Google has not helped. I am very new to PERL and programming so I don't even know what I should be looking for to solve this issue.
Issue 1: HTML::TableExtract cannot pick up any data for the first cell. I guess this is because this cell is a html link to a *.png file, for example (line 380):
<td><img class="minimizeStyle" src="http://www.risa.com.au/JockeySilks/58035.png" /></td>.What I would like to do is populate the first cell in each row with the 58035.png (in this case).
Issue 2: Where HTML::TabelExtract find an empty cell, it ignores it, for example (line 391):
<td style="text-align:center"></td>Instead of ignoring it I would like a "," to be inserted so that when (eventually) a csv file is created everything will be in line.
The code I have at this stage is:
use strict; use warnings; use HTML::TableExtract; use LWP::Simple; my $html = get("http://www.risa.com.au/FreeFields/Results.aspx?Key=201 +3Aug19,VIC,Echuca"); my $te = HTML::TableExtract->new; $te->parse($html); # Table parsed, extract the data binmode STDOUT, ":utf8"; foreach my $table ( $te->tables ) { foreach my $row ($table->rows) { my @values = grep {defined} @$row; print " ", join(',', @values), "\n"; }# Foreach }# Foreach
I hope someone is able to point me in the right direction.
Cheers
Mr Bigglesworth
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::TableExtract issues
by choroba (Cardinal) on Aug 24, 2013 at 07:43 UTC | |
by Mr Bigglesworth (Initiate) on Aug 24, 2013 at 12:26 UTC | |
by poj (Abbot) on Aug 24, 2013 at 16:17 UTC | |
by Mr Bigglesworth (Initiate) on Aug 26, 2013 at 12:50 UTC | |
by Laurent_R (Canon) on Aug 24, 2013 at 15:29 UTC | |
by Mr Bigglesworth (Initiate) on Aug 26, 2013 at 12:24 UTC |