puran_bair has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters of perl Wisdom,

I would like to read the HTML of a page on a website and extract certain nuggets of information.

Specifically, I want to repeatedly read stock quotes that are available free on some sites, and build a file of time and price that I can then import into Excel.

How do I read the HTML of a web page into a character string? Is there some varient of open that specifies a URL?

Complicating factor: I will need to post data to the web server, like my account number and the quote symbol.

I am grateful and eager for any crumbs of knowledge you might generously bestow.

Replies are listed 'Best First'.
Re: Reading a webpage
by fruiture (Curate) on Dec 31, 2002 at 17:54 UTC

    The Problem consists of three parts. First, fetching the resource located by the URL. This is done easily with LWP::Simple and if you need more details, write your own LWP::UserAgent. To create a nice, simple HTTP Request use HTTP::Request::Common and if again you need more control, use HTTP::Request.

    When you've retrieved the resource, you must extract data from that HTML Document the resource is. To do that correctly, don't use regular expressions but an HTML::Parser, probably most straight-forward is HTML::TokeParser.

    Third, putting the stuff into an Excel table is done using Spreadsheet::WriteExcel

    And if you didn't know before, now you know why the CPAN is a Good Thing.

    --
    http://fruiture.de
Re: Reading a webpage
by thoth (Novice) on Dec 31, 2002 at 18:29 UTC
    You need to look up finance::quote it will do the job perfectly. I have actually already implemented it, but it assumes that you have directories set up with the industry group name and then files for each ticker. If you are interested let me know. I am also currently working on one that will pull the industry groups from their(yahoo) stock screener and then pull the symbols from the list with the HTML:TableExtract. Thoth
Re: Reading a webpage
by Juerd (Abbot) on Dec 31, 2002 at 17:48 UTC

    How do I read the HTML of a web page into a character string? Is there some varient of open that specifies a URL?

    You need to work on your searching skills. Use LWP::UserAgent. If you want to only get,

    use LWP::Simple; my $page = get 'http://foo.com/';

    Next time, use Google, search.cpan.org and Super Search first, so we can use our time to answer harder questions instead.

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      Thanks that helped me 2. what is this rtfm everyone keep saying? j/k.