in reply to How to scrape an HTTPS website that has JavaScript

I once needed to work with javascript. I just parsed the html and grepped for the variable initialization that I needed and took the value with a regex. No need to execute the code just to concatenate a couple of strings together.

As for the other part of your request regarding an https request, I've had mixed results. I have been unable to get https working through our proxy/firewall at work. I can go through the proxy via http without a problem. A bit of googling leads me to a link dated 2001 that says libwww and crypt::ssleay almost work together but don't.

Here's a short example I've used to illustrate the problem. It fails for me on ActiveState 5.8.4 on xp. (*nix is not an option as this is my work machine, nor is cygwin)
#!/usr/bin/perl use warnings; use strict; use LWP::UserAgent; #http://groups.yahoo.com/group/libwww-perl/message/7242 $|=1; my @hosts = ('http://login.yahoo.com', 'https://login.yahoo.com'); my $ua=LWP::UserAgent->new; $ua->agent("Mozilla/5.0 "); my $https_proxy=$ENV{https_proxy}; delete $ENV{https_proxy} if ($https_proxy); $ua->env_proxy; $ENV{https_proxy}=$https_proxy if ($https_proxy); foreach (@hosts) { my $req = HTTP::Request->new(GET => $_); #$req->proxy_authorization_basic($ENV{HTTP_PROXY_USER}, $ENV{HTTP_PR +OXY_PASS}); my $res = $ua->request($req); if ($res->is_success) { print $res->status_line, "\nsomething\n"; } else { print $res->status_line, "\nnothing\n"; } }

Replies are listed 'Best First'.
Re^2: How to scrape an HTTPS website WOOHOO
by elwarren (Priest) on Aug 24, 2004 at 18:45 UTC
    Woohoo! It works now :-) After digging this old problem out of the dead projects folder, I couldn't leave it alone. It seems that Crypt::SSLeay uses HTTPS_PROXY_USERNAME while LWP uses HTTP_PROXY_USER. In my testing, the HTTPS_PROXY env setting still needs to be deleted then set again. Changing the env proxy block of my code to this works now:
    my $https_proxy=$ENV{HTTPS_PROXY}; delete $ENV{HTTPS_PROXY} if ($https_proxy); $ua->env_proxy; $ENV{HTTPS_PROXY}=$https_proxy if ($https_proxy); $ENV{HTTPS_PROXY_USERNAME}=$ENV{HTTP_PROXY_USER}; $ENV{HTTPS_PROXY_PASSWORD}=$ENV{HTTP_PROXY_PASS};
    Yay! Now I have to go write the rest of what I'd set out to do in the first place. Another dead project lives again!