Neither of the following two extraction approaches seemed to work on the HTML. I had to find the table by border (as you can see in above code). This results in something more fragile than I would like.# get the data from the web. Typically this is: # http://www.sailwx.info/shiptrack/cruiseships.phtml # Either pass this in as --url <page_url> when invoking or just set it +. $cols = 'Ship,last reported (UTC),position,Callsign'; $url = "http://www.sailwx.info/shiptrack/cruiseships.phtml"; my $input; my $out_fn = 'C:\\Program Files\\cron\\Cruise Ships\\ship_data.csv'; open(my $out_fh, '>', $out_fn) or die("Unable to create output file \"$out_fn\": $!\n"); my $m = WWW::Mechanize->new(); $m->get($url); $input = $m->content; my $te; if ( defined ($cols)) { my @headers = split(/,/, $cols); te = HTML::TableExtract->new( attribs => { border => 1 } ); } else { $te = new HTML::TableExtract( depth => $depth, count=>$count); } $te->parse($input);
I thought at least one of these two lines would work but they don't:
$te = new HTML::TableExtract ( headers => [qw(Ship position)] ); $te = new HTML::TableExtract(headers=>\@headers);
I even tried this line...
... but it doesn't work either. Well, I didn't really think it was a problem with spaces. I don't know what the problem is.$te = HTML::TableExtract->new( headers => \@headers);
In reply to Extracted HTML table by mcoblentz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |