mcoblentz has asked for the wisdom of the Perl Monks concerning the following question:
Neither of the following two extraction approaches seemed to work on the HTML. I had to find the table by border (as you can see in above code). This results in something more fragile than I would like.# get the data from the web. Typically this is: # http://www.sailwx.info/shiptrack/cruiseships.phtml # Either pass this in as --url <page_url> when invoking or just set it +. $cols = 'Ship,last reported (UTC),position,Callsign'; $url = "http://www.sailwx.info/shiptrack/cruiseships.phtml"; my $input; my $out_fn = 'C:\\Program Files\\cron\\Cruise Ships\\ship_data.csv'; open(my $out_fh, '>', $out_fn) or die("Unable to create output file \"$out_fn\": $!\n"); my $m = WWW::Mechanize->new(); $m->get($url); $input = $m->content; my $te; if ( defined ($cols)) { my @headers = split(/,/, $cols); te = HTML::TableExtract->new( attribs => { border => 1 } ); } else { $te = new HTML::TableExtract( depth => $depth, count=>$count); } $te->parse($input);
I thought at least one of these two lines would work but they don't:
$te = new HTML::TableExtract ( headers => [qw(Ship position)] ); $te = new HTML::TableExtract(headers=>\@headers);
I even tried this line...
... but it doesn't work either. Well, I didn't really think it was a problem with spaces. I don't know what the problem is.$te = HTML::TableExtract->new( headers => \@headers);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracted HTML table
by pc88mxer (Vicar) on Mar 20, 2008 at 06:34 UTC | |
|
Re: Extracted HTML table
by wfsp (Abbot) on Mar 20, 2008 at 07:14 UTC | |
by mcoblentz (Scribe) on Mar 20, 2008 at 14:24 UTC | |
|
Re: Extracted HTML table
by poolpi (Hermit) on Mar 20, 2008 at 07:34 UTC |