blackadder has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks;

This is my first attempt in anything like this, So I am not sure on what I need in terms of libraries and where to start from?....

My questions are: How can I automate the following?

1- Goto a certain web site.
2- Goto a certain area in the web site - lets say the message board.
3- Download all the massages posted today.


Basically I need to know how to navigate inside the web and how can I get my scripts to click on a certain button or to fill in forms automatically. I am not sure if I need to know about CGI or is it achievable with simple OLE libraries. However, if I can get the above three steps achieved then I can expand on it and learn the rest.

Any Perls of wisdom on how these steps can be achieved are highly appreciated, and your replies will definitely enlighten the path for me.

Best Regards and many thanks in advance.

Replies are listed 'Best First'.
Re: Strating Internet related scripts
by kvale (Monsignor) on Oct 17, 2002 at 20:16 UTC
    The first step is figure out the precise web addresses for all the information you want to download. Then look up LWP::Simple - the module provides a simple web client interface perfect for downloading web content.

    -Mark

Re: Strating Internet related scripts
by nothingmuch (Priest) on Oct 17, 2002 at 21:24 UTC
    If you need to figure out the web content, you should probably use HTML::Parser on data from LWP::Simple (or other LWP modules).
    You can then trap anchors (<a href>) and find out which one you want, or something of the sort.
    a useless example:
    use HTML::Parser; use LWP::Simple qw(get); my $url = 'http://www.perlmonks.org/'; my @links; my $parser = new HTML::Parser( # make a new parser api_version => 3, start_h => [ sub { push @links, pop @{$_[0]} }, 'tokens' ], # stac +k the url part of the anchor tag to @links ); $parser->report_tags("a"); # only handle a tags $parser->parse(get($url)); # parse perlmonks.org print join("\n",@links); # print all links from the root page


    -nuffin
    zz zZ Z Z #!perl
Re: Strating Internet related scripts
by ajt (Prior) on Oct 18, 2002 at 08:18 UTC

    Depending on how many hoops you have to jump through this could be very easy or very hard.

    If you don't have to contend with logging in and cookies then it could be easier to use a tool such as wGet to get the work done. It runs on most flavours of Unix, and on Windows if you have Cygwin installed.

    A more Perlish solution would be to use LWP to interact with the web server for you, it can handle logins, and cookies for you, and grab the pages you want. There are a range of HTML parsing tools at your disposal, and to the list already suggested I would add HTML::TreeBuilder which I think is quite good and often overlooked.

    One thing I would recommend is the Sean Burke's excellent excellent "Perl and LWP" (ISBN 0596001789) from O'Reilly. It's a little on the slim side, but it does cover the LWP module, and several of the HTML parsing modules, with plenty of examples and useful explanation. There are reviews: Perl & LWP and Perl and LWP.

    Other good resources are davorg's "Data Munging with Perl" (ISBN 1930110006) which has a good chunk of grabbing and parsing web pages, and the long defunct Web Client Programming with Perl.


    --
    ajt
Re: Strating Internet related scripts
by sch (Pilgrim) on Oct 17, 2002 at 22:11 UTC

    I don't know if you're talking about Yahoo Groups, but why not have a look at the WWW::Yahoo::Groups module @ CPAN?

    I've been having a play with it and it seems to work fairly well, and should give you some ideas.