Retrieving contents of web pages

RayRay459 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Retrieving contents of web pages by OeufMayo (Curate) on Aug 29, 2001 at 04:43 UTC
If you want to avoid the complexity of LWP and other HTML parsing modules, you may want to look at WWW::Chat, which is one of the easiest way to navigate through websites with perl. This modules creates LWP + HTML::Form scripts automatically via the `webchatpp` program. There's still some features missing in this module, but it usually does a fair job. And more features may be added soon! A simple example webchatpp script of what you want may look like this: `GET http://www.mysite.com/loginpage.html EXPECT OK FORM login F login=OeufMayo F password=s33kret CLICK EXPECT OK FOLLOW /Interesting link/ EXPECT OK for (@links){ print join("\n", map{ "@$_[1]\n\tURL: @$_[0] "} @links); }` [download] Pretty simple, isn't it? <kbd>-- my $OeufMayo = new PerlMonger::Paris({http => 'paris.mongueurs.net'});</kbd>	[reply] [d/l]
Re: Re: Retrieving contents of web pages by RayRay459 (Pilgrim) on Aug 29, 2001 at 20:05 UTC
OeufMayo, thank you very much for your sample code. That looks like it may work. I'll look into it deeper and probably post code if i get it to work. Thanks again. Ray	[reply]
Re: Retrieving contents of web pages by LD2 (Curate) on Aug 29, 2001 at 03:50 UTC
Check out the documentation of LWP. There is a small example in the documenation. You'll also want to look at LWP::UserAgent and lwpcook - which is the libwww-perl cookbook. Good luck.	[reply]
Re: Re: Retrieving contents of web pages by RayRay459 (Pilgrim) on Aug 29, 2001 at 20:02 UTC
Thank you for your advice. I will check both of them out. Browsing over the links that you gave me, i still need to find a way to enter in a username and password to gain access to the page. Thank again. Ray	[reply]
Re: Retrieving contents of web pages by cajun (Chaplain) on Aug 29, 2001 at 07:10 UTC
Resources I found helpful when doing something similar were: LWP w/ Cookie Based Logins Accessing web pages RE: I am looking at entering a website.	[reply]
Re: Re: Retrieving contents of web pages by RayRay459 (Pilgrim) on Aug 29, 2001 at 20:06 UTC
Thank you very much for the advice. I will check these out as well. Ray	[reply]
Re: Retrieving contents of web pages by the_slycer (Chaplain) on Aug 29, 2001 at 03:23 UTC
LWP wget	[reply]
Re: Retrieving contents of web pages by andye (Curate) on Aug 29, 2001 at 12:45 UTC
Something like this? `use LWP::Simple; my $page = get('http://username:password@www.example.com/page.html');` [download] (not tested this way of doing username and password, but should work). andy.	[reply] [d/l]
Re: Re: Retrieving contents of web pages by domm (Chaplain) on Aug 29, 2001 at 23:36 UTC
Quoting from RFC 2396: 3.2.2. Server-based Naming Authority URL schemes that involve the direct use of an IP-based protocol to a specified server on the Internet use a common syntax for the server component of the URI's scheme-specific data: <userinfo>@<host>:<port> where <userinfo> may consist of a user name and, optionally, scheme specific information about how to gain authorization to access the server. The parts "<userinfo>@" and ":<port>" may be omitted. server = [ [ userinfo "@" ] hostport ] The user information, if present, is followed by a commercial at-sign "@". userinfo = *( unreserved \| escaped \| ";" \| ":" \| "&" \| "=" \| "+" \| "$" \| "," ) Some URL schemes use the format "user:password" in the userinfo field. This practice is NOT RECOMMENDED, because the passing of authentication information in clear text (such as URI) has proven to be a security risk in almost every case where it has been used. [download] BTW, I am using my $page = get('http://username:password@www.example.com/page.html'); and it is working. (I am not using this over the internet, see the security advice ..)	[reply] [d/l]