cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks,

I am trying to pull down link content from webpages that can only be accessed using a username and password through a proxy URL. To manually go to the webpage I have to set the proxy URL under the options tab (FireFox) and then when I enter the webpage it asks me for a username and password. I enter these and then the website recognizes me as a specific user.

I am having trouble understanding how I would do the same in perl. I am using LWP::Simple to perform get() on links and obtain their content (works on webpages that are not restricted), but I am not clear on how to mimic the proxy url with user/pass to access the restricted webpages. I have been looking at setting the $ENV variable, as well as using the proxy aspect of LWP::UserAgent

$ua->proxy(['http', 'ftp'], 'http://proxy.sn.no:8001/');

but have so far been unsuccessful. I am feeling a little lost so I have come for your guidance.

Most Humbly,

Chris

Replies are listed 'Best First'.
Re: Accessing webpages with proxy url requiring user/pass
by Fletch (Bishop) on Sep 07, 2007 at 17:03 UTC

    You're a bit fuzzy (at least it's not clear to me) on whether it's the proxy which is requiring authentication, the remote server which is using browser-based (i.e. HTTP Basic authentication), or the target site which has a login page which you need to fill in.

    In the first case, read perldoc lwpcook for its section on proxies which gives an example of the syntax to use to connect through an authenticating HTTP proxy. In the second case, you want to read "Access to protected documents" in the same place.

    As for the final case, you probably want WWW::Mechanize as has been already mentioned.

      Mechanize doesn't meet my needs in this case (as far as I can tell). LWPcook was exactly what I needed -- see PROXIES.

      The following got me logged in to the website programatically, which then allowed me to open the links in the site ($links) to view their content($body).

      $ua = LWP::UserAgent->new; $ua->proxy(['http', 'ftp'] => 'http://user:pass@proxy.site'); $req = HTTP::Request->new('GET',"$link"); $res = $ua->request($req); $body = $res->content;

      It is important to make sure that proxy.site is not the url that you entered in the tools options of the web browser as the proxy server, but is the one that comes up on the user/pass request box when you are entering those values manually when requested (that took me a while to figure out).

      Thanks for pointing LWPcook out. There are a number of copies out there, but the only one complete one is on CPAN at http://search.cpan.org/~gaas/libwww-perl-5.808/lwpcook.pod

      Chris Herold
Re: Accessing webpages with proxy url requiring user/pass
by dwm042 (Priest) on Sep 07, 2007 at 16:45 UTC
    Without knowing how the proxy works, it's hard for me to say what to do, but the King of interactive response and screen scraping is the module WWW::Mechanize, which can be found here.
Re: Accessing webpages with proxy url requiring user/pass
by b4swine (Pilgrim) on Sep 08, 2007 at 02:55 UTC
    Read up on the WWW::Mechanize module. It allows you do fill in fields, username passwords, etc. on web pages and much more.