justinm1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've seen quite a bit on this topic, both here and around the net, but I'm still having some issues with this code. Basically, this test script is attempting to return the raw html for the site defined below. It works fine for http, but not for https, which returns a 400 error (bad request). I've already set $ENV{HTTPS_PROXY} and am also using Crypt::SSLeay, both recommended fixes I've seen elsewhere. It could be related to the proxy, but I unfortunately can't test it without one (I've changed the actual proxy address for my company due to paranoia). Here's the code:
use strict; $ENV{HTTPS_PROXY} = 'my_proxy:8080'; use HTML::Parser; use WWW::Mechanize; use Crypt::SSLeay; use LWP::Debug qw(+); my $mech = WWW::Mechanize->new(); $mech->agent('Mozilla/5.0'); $mech->proxy(['https', 'http', 'ftp'], 'my_proxy:8080'); # http $mech->get("http://www.google.com"); # https #$mech->get("https://www.cia.gov"); my $c = $mech->content; print $c;
Anything I'm obviously doing wrong here? I've hit a wall with my little project due to this. Any help would be much appreciated...

Replies are listed 'Best First'.
Re: WWW::Mechanize with https and a proxy
by almut (Canon) on Oct 10, 2007 at 00:35 UTC

    IIRC, you're not supposed to set the proxy through the Mech user agent object, too, when you're using Crypt::SSLeay and the environment variable HTTPS_PROXY — Crypt::SSLeay should handle it transparently all by itself...  So, try to remove 'https' from

    $mech->proxy(['https', 'http', 'ftp'], 'my_proxy:8080');

    ( or maybe even explicitly put $mech->proxy('https', undef); )

      IMHO, WWW::Mechanize is child of LWP::UserAgent so it's OK to use it proxy() method (just as agent).

Re: WWW::Mechanize with https and a proxy
by Gangabass (Vicar) on Oct 10, 2007 at 03:37 UTC

    First you don't need use Crypt::SSLeay; -- LWP will do all job for you.

    Second i think $mech->proxy('https', 'http', 'ftp', 'my_proxy:8080'); much better than $ENV{HTTPS_PROXY} = 'my_proxy:8080';

    And third i think your code is good. So you right (IMHO) this is your proxy server issue and not your code.

      First you don't need use Crypt::SSLeay; -- LWP will do all job for you.

      Nope, that's not true. LWP delegates the handling of setting up the SSL connection to Crypt::SSLeay, or, more specifically, the Net::SSL package therein (which LWP will load on demand). These days, Crypt::SSLeay itself is an empty shell.

      On the possibility that I could be misunderstanding you, let me put it another way: yes, the declaration is not needed in the client code, but the Crypt-SSLeay distribution must be installed for an https connection to work.

      While you may not like the environment variable hack, it's about the only way to transmit out-of-band information to a delegated-to package, without polluting the call stack all the way down. I suppose some sort of API could be added at the class level, but it would probably be tricky to get right when threads are taken into account.

      • another intruder with the mooring in the heart of the Perl

        Of course i mean declaration.

      ... $mech->proxy(['https', 'http', 'ftp'], 'my_proxy:8080'); much better than $ENV{HTTPS_PROXY} = 'my_proxy:8080';

      You are correct in theory :)  In practice, though, I think it's a known problem that LWP's proxy method doesn't work with HTTPS (see the yellow box on the right on the page linked to). This has come up a couple of times (here at PM, and elsewhere), and the workaround has so far typically been to use Crypt::SSLeay in combination with the HTTPS_PROXY env var. So, unless the issue has been fixed in the meantime (which I don't think it has), it's always a good default strategy to try what has worked for others...  Hopefully, the OP will report back with what worked in this case.

        Just an FYI, I've found I definitly need the SSLeay and env variable setting, otherwise it hangs (so I can confirm the above statement). At least I get something back when I include them, even if it is a 400 error. If this script works for others without a proxy, then I can only assume it's something wonky with our proxy, which is a bummer. I'll ping the internal help group again and report back if there's an update.

        As i remember i work fine with LWP through proxy (HTTP) with some Google pages (https://adwords.google.com/select/main?cmd=KeywordSandbox). As i remember LWP can't work with SOCKS proxy :-(. And for such task i use Crypt::SSLeay.