Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

table capture

by Kanishka (Beadle)
on Apr 18, 2005 at 04:35 UTC ( #448731=perlquestion: print w/replies, xml ) Need Help??

Kanishka has asked for the wisdom of the Perl Monks concerning the following question:

I use this code to capture a page from the web.
use LWP::Simple; $reponse = get(""); open(INFILE, ">SCE.htm") or die "Can't open out File: $!\n"; print INFILE $reponse; close INFILE;
I want to get each element of data in a row into seperate variables so i can store them in a datebase. I have tried several times but the HTML seems to be irregular and i'm a novice. :)

Replies are listed 'Best First'.
Re: table capture
by bobf (Monsignor) on Apr 18, 2005 at 05:03 UTC

    Parsing HTML can be tricky, which is why you should use a module to do it whenever possible. There are several HTML parsers on CPAN, and some are specifically designed to parse tables. For example, check out HTML::TableContentParser.

    In addition, since you seem to be downloading stock quotes, you might want to check out some of the modules in the Finance::Quote namespace. You might be able to use one of those to get the data and skip parsing the table completely.

    If you need more help, feel free to ask. Using modules will definitely make it easier.


Re: table capture
by zentara (Archbishop) on Apr 18, 2005 at 12:29 UTC
    Here is an example using HTML::TableExtract. Look at the printout, and you should be able to regex it (or deduce the array structure) and assign it to variables.
    #!/usr/bin/perl use HTML::TableExtract; use LWP::Simple; use Data::Dumper; my $te = new HTML::TableExtract(gridmap=>1); my $content = get(""); $te->parse($content); foreach $ts ($te->table_states) { foreach $row ($ts->rows) { #print Dumper $row; print @{$row},"\n"; } }

    I'm not really a human, but I play one on earth. flash japh
Re: table capture
by Ben Win Lue (Friar) on Apr 18, 2005 at 11:42 UTC
    This is not an answer to your question, but it would be easier to read your code, if you had named your filehandle OUTFILE

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://448731]
Approved by thor
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2022-12-04 22:07 GMT
Find Nodes?
    Voting Booth?

    No recent polls found