comment on

As part of a project to scrape a web page and obtain the data inside a table for further evaluation and analysis, I wrote the code below (borrowing from public examples that I don't always understand - yes, I'm new to Perl). However, on compiliing/invocation, I get the following error response, which I don't quite understand. Can't locate object method "open" via package "GLOB" at C:/Perl/lib/IO/File.pm line 163. I tried to follow this via the debugger but it closed down at the point of the error. Since my program doesn't have 163 lines, it must be in the IO::File module, which I thought was standard. I'm running ActiveState Perl 5.10 on XP SP2. I installed nmake1.5 from MSFT in order to install the HTML::TableExtract module from CPAN. The web page is cited in the code if you are that curious.

#!/usr/bin/perl -w
# based on: extract-table.pl,v 24.1 2006/10/21 01:19:37 from Raman @ K
+oders.com
# Accepts a URI and table spec; returns a csv file
use strict;
use FileHandle;
use LWP::UserAgent;
use HTML::TableExtract;
#use IO::File;
use Getopt::Long;
use WWW::Mechanize;
use vars qw (%options);
my ($url, $file, $task, $depth, $count, $cols);

my %options = (task => \$task,
               url => \$url,
               file => \$file,
               depth => \$depth,
               count => \$count,
               headers => \$cols);
GetOptions (\%options,
            'file=s',
            'url=s',
            'task=s',
            'depth=i',
            'count=i',
            'headers=s');

# get the data from the web.  Typically this is http://www.sailwx.info
+/shiptrack/cruiseships.phtml
# either pass this in as --url <page_url> when invoking or just set it
+.

$cols = "Ship,'last reported (UTC)',position,Callsign";
$url = "http://www.sailwx.info/shiptrack/cruiseships.phtml";

my $input;
my $output = new OUTFILE ('>C:\Program Files\cron\Cruise Ships\ship_da
+ta.csv');
open (OUTFILE, '>C:\Program Files\cron\Cruise Ships\ship_data.csv');

my $m = WWW::Mechanize->new();
   $m->get($url);
   $input = $m->content;
print (OUTFILE $input); 

my $te;
if ( defined ($cols)) 
{
  my @headers = split(',', $cols);
  $te = new HTML::TableExtract(headers=>\@headers);
} else 
   {
    $te = new HTML::TableExtract( depth => $depth, count=>$count); 
   }
$te->parse_file($input);
 

my ($ts,$row);
foreach $ts ($te->table_states) 
{
   foreach $row ($ts->rows) 
   {
      $output->print ( join(',', @$row), "\n");
   }
}

close (OUTFILE);

if (defined ($url)) {
  unlink ($input);
}
[download]

The code above is a work in progress, of course; I'm just trying to scrape the page, find the table data and place in a CSV file so I can query out of it via DBI::CSV to create a text file which helps me track airplanes in flight or cruise ships (this particular project) on the screen background on the laptop.

In reply to File open problem with "GLOB" by mcoblentz

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.