RayRay459 has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,
I have a dozen links that i need to go through and get the information off of the web page. Can anyone point me in the right direction as to where to go to get started. Also, I need to login to these web pages, how do i get those variables to be read in so that i can proceed? Any advice will help.
Thanks,
Ray

Replies are listed 'Best First'.
Re: Retrieving contents of web pages
by OeufMayo (Curate) on Aug 29, 2001 at 04:43 UTC

    If you want to avoid the complexity of LWP and other HTML parsing modules, you may want to look at WWW::Chat, which is one of the easiest way to navigate through websites with perl. This modules creates LWP + HTML::Form scripts automatically via the webchatpp program.
    There's still some features missing in this module, but it usually does a fair job. And more features may be added soon!

    A simple example webchatpp script of what you want may look like this:

    GET http://www.mysite.com/loginpage.html EXPECT OK FORM login F login=OeufMayo F password=s33kret CLICK EXPECT OK FOLLOW /Interesting link/ EXPECT OK for (@links){ print join("\n", map{ "@$_[1]\n\tURL: @$_[0] "} @links); }

    Pretty simple, isn't it?

    <kbd>--
    my $OeufMayo = new PerlMonger::Paris({http => 'paris.mongueurs.net'});</kbd>
      OeufMayo, thank you very much for your sample code. That looks like it may work. I'll look into it deeper and probably post code if i get it to work. Thanks again.
      Ray
Re: Retrieving contents of web pages
by LD2 (Curate) on Aug 29, 2001 at 03:50 UTC
    Check out the documentation of LWP. There is a small example in the documenation. You'll also want to look at LWP::UserAgent and lwpcook - which is the libwww-perl cookbook. Good luck.
      Thank you for your advice. I will check both of them out. Browsing over the links that you gave me, i still need to find a way to enter in a username and password to gain access to the page. Thank again.
      Ray
Re: Retrieving contents of web pages
by cajun (Chaplain) on Aug 29, 2001 at 07:10 UTC
      Thank you very much for the advice. I will check these out as well.
      Ray
Re: Retrieving contents of web pages
by the_slycer (Chaplain) on Aug 29, 2001 at 03:23 UTC
Re: Retrieving contents of web pages
by andye (Curate) on Aug 29, 2001 at 12:45 UTC
    Something like this?
    use LWP::Simple; my $page = get('http://username:password@www.example.com/page.html');
    (not tested this way of doing username and password, but should work). andy.
      Quoting from RFC 2396:
      3.2.2. Server-based Naming Authority URL schemes that involve the direct use of an IP-based protocol to a specified server on the Internet use a common syntax for the server component of the URI's scheme-specific data: <userinfo>@<host>:<port> where <userinfo> may consist of a user name and, optionally, scheme specific information about how to gain authorization to access the server. The parts "<userinfo>@" and ":<port>" may be omitted. server = [ [ userinfo "@" ] hostport ] The user information, if present, is followed by a commercial at-sign "@". userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," ) Some URL schemes use the format "user:password" in the userinfo field. This practice is NOT RECOMMENDED, because the passing of authentication information in clear text (such as URI) has proven to be a security risk in almost every case where it has been used.
      BTW, I am using my $page = get('http://username:password@www.example.com/page.html'); and it is working. (I am not using this over the internet, see the security advice ..)