Roy Johnson has asked for the wisdom of the Perl Monks concerning the following question:

First, I want to note for the benefit of anyone who has had to deal with a cantankerous web server that, if it only wants to server to Internet Explorer clients, it may help to code like this:
my $a = WWW::Mechanize->new(autocheck => 1, agent => 'Mozilla/4.0 (compatible; MSI +E 6.0; Windows NT 5.0; T312461)', keep_alive=> 1); my @silly_headers = ('Accept-Language' => 'en-us', 'Accept-Encoding' => 'gzip, deflate', 'Accept' => '*/*'); $a->get($testpage, @silly_headers);
That solved the problem that I was having in Mechanize and "Not implemented".

Now, for the new puzzler, which is not a Perl problem, per se, but which I hope has a Perl solution: There is a secure web site that I can access via browser, but when I try to get to the login page via WWW::Mechanize, I get a failure. Unlike my previous problem, this website is public. However, the problem is most likely tied to the proxy server, so there's still going to be some difference when you run it vs. when I do.

While the error message appears to come from the proxy, I note that I do not have a problem in accessing the site, via the same proxy, from a browser. Very odd.

Here's the script:

#!perl # Automated navigation through web pages use strict; use warnings; use WWW::Mechanize; my $testpage='https://www.openinvoice.com/docp/login/login.jsp'; my $teststring = 'Oilfield Commerce Platform (TM) Login'; $testpage = 'http://www.openinvoice.com'; $teststring = 'login now'; my $a = WWW::Mechanize->new(autocheck => 1, agent => 'Mozilla/4.0 (compatible; MSI +E 6.0; Windows NT 5.0; T312461)', keep_alive=> 1); $a->proxy(['http', 'https'], 'http://www-proxy:8080/proxy.pac'); my @silly_headers = ('Accept-Language' => 'en-us', 'Accept-Encoding' => 'gzip, deflate', 'Accept' => '*/*'); $a->get($testpage, @silly_headers); $a->success() or die "Failed to get $testpage: ".$a->status()."\n".$a- +>res()->as_string()."\n"; my $link = $a->find_link( text => 'login now.' ) or do { warn "Did not find link\n"; warn "Link: ". $_->[1] .' -> '. $_->[0] ."\n" for $a->links(); }; #use Data::Dumper; #die Dumper($link), "\n"; $a->get($link->[0], @silly_headers); $a->success() or warn "Followed link to ", $a->base, "\n"; $a->success() or die "Failed to get ".($a->base).': '.$a->res()->as_st +ring()."\n"; print "Page is ", $a->content, "\n";
And the output I get:
Followed link to https://www.openinvoice.com/docp/corp/main/login Failed to get https://www.openinvoice.com/docp/corp/main/login: HTTP/1 +.0 500 (Internal Server Error) Error from proxy Content-Type: text/html Client-Date: Fri, 28 Nov 2003 18:05:11 GMT Client-Peer: 148.89.144.220:8080 Client-Response-Num: 1 Mime-Version: 1.0 Proxy-Agent: iPlanet-Web-Proxy-Server/3.6 Title: Error <HTML> <HEAD><TITLE>Error</TITLE></HEAD> <BODY> <H1>Error</H1> <BLOCKQUOTE><B> <HR SIZE=4><P> The requested item could not be loaded by the proxy.<P> The certificate issuer for this server is not recognized by Netscape. The security certificate may or may not be valid. Netscape refuses to connect to this server.<P> <HR SIZE=4> </B></BLOCKQUOTE> <P> <ADDRESS>Proxy server at flash.sugarland.unocal.com on port 8080</ADDR +ESS> </BODY></HTML>

The PerlMonk tr/// Advocate

Replies are listed 'Best First'.
Re: More Mechanize Woes
by calin (Deacon) on Nov 28, 2003 at 19:12 UTC
    The requested item could not be loaded by the proxy.<P> The certificate issuer for this server is not recognized by Netscape. The security certificate may or may not be valid. Netscape refuses to connect to this server.<P>

    It seems that the proxy doesn't allow you to connect to SSL sites that don't have valid certificates (from its point of view).

    Modify the proxy configuration to clear this restriction or, if you have a custom certificate issuer, you can install it in the proxy (it's you choice).

      The interesting thing about it, though, is that the browser, which uses the very same proxy, does not have this problem. Does that make any sense?

      The PerlMonk tr/// Advocate
Solved! Re: More Mechanize Woes
by Roy Johnson (Monsignor) on Dec 04, 2003 at 18:30 UTC
    Some accumulated wisdom and another question:
    Some sniffing and blundering around on the net led me to the Crypt::SSLeay docs page, which says:
    At the time of this writing, libwww v5.6 seems to proxy https requests + fine with an Apache mod_proxy server. It sends a line like: GET https://www.nodeworks.com HTTP/1.1 to the proxy server, which is not the CONNECT request that some proxie +s would expect, so this may not work with other proxy servers than mo +d_proxy. The CONNECT method is used by Crypt::SSLeay's internal proxy + support.
    Sure enough, using Crypt::SSLeay style proxy specification for HTTPS, I get the login page I have sought so long.

    The new problem? it doesn't appear to get decrypted. When I print out $a->content, I get gibberish. That's going to make it difficult to Mechanize the login.

    Update:
    It turns out that the silly headers I was passing to make my script look more like a browser were the problem. I made my script not pass them in the HTTPS request, and suddenly I had a normal web page.

    Problem solved, thanks for your help, and I hope that somebody finds this useful down the road.


    The PerlMonk tr/// Advocate