smocc has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to build a little web-crawler using LWP, but the get method from LWP::Simple just won't work. I've written this little test program to try and work it out, but it's not helping.
use strict; use LWP::Simple; my $url = "http://kompas.com/kompas-cetak/0505/09/metro/index.htm"; my $conts = get $url or die "I canna do it Cap'n!\n"; print $conts;
It compiles and runs like this and just dies with the message with no errors about the package. The method just returns nothing. I'm running ActivePerl on Windows, which I think is contributing to the problem, but I can't figure out how. The Makefile.pl for LWP seemed to run fine. Can you see what's going wrong?

edit After finding the guide to ppm and trying to install the package that way I find that I get an error from ppm like this:
ppm> install LWP Error: No valid repositories Error: 500 Can't connect to ppm.ActiveState.com:80 (Bad protocol 'tcp' +) Error: 500 Can't connect to ppm.ActiveState.com:80 (Bad protocol 'tcp' +)
What's going on here?

"We shall peck them to death tomorrow, my dear."

Replies are listed 'Best First'.
Re: Problems with LWP
by Joost (Canon) on May 17, 2005 at 12:15 UTC
      Using getprint instead of get returns this error: 500 Can't connect to kompas.com:80 (Bad protocol 'tcp') I'm pretty sure I don't have a firewall running...

      "We shall peck them to death tomorrow, my dear."
Re: Problems with LWP
by Roger (Parson) on May 17, 2005 at 12:48 UTC
    Looks like an issue with the proxy. Add these environment variables to your windows environment settings:

    HTTP_proxy
    HTTP_proxy_user
    HTTP_proxy_pass

    Remember that the HTTP_proxy variable should contain the http:// prefix.
      The OP's code didn't work for me either. First I thought, ok, most likely it is a problem at the remote side. It may decide not to respond to unknown agents. So i rewrote the OP's code to the following:
      use strict; use LWP::UserAgent; my $lwp = LWP::UserAgent->new( agent => 'Mozilla/5.0' ); my $response = $lwp->get ("http://kompas.com/kompas-cetak/0505/09/met +ro/index.htm"); if ( $response->is_error ) { print "Error: ", $response->as_string(); } else { print "Success: ", $response->as_string(); }
      but I get a Error: 500 (Internal Server Error) Can't connect to kompas.com:80 (Bad hostname 'kompas.com').
      As far as I understand it, this means the name could not be resolved. I also tried with "http://google.de" with the same outcome.

      I am behind a proxy, and I have the environment set correctly.

      The strange thing: I have no trouble with ppm.


      holli, /regexed monk/
      This is a version that works for me now. Hope it helps a bit. (I've shamelessly stolen the relevant code from PPM.pm :-)
      use strict; use LWP::UserAgent; my $href = "http://kompas.com/kompas-cetak/0505/09/metro/index.htm +"; my $ua = new LWP::UserAgent; my $request = new HTTP::Request ("GET" => $href); $ua->env_proxy, $request->proxy_authorization_basic($ENV{HTTP_proxy_user}, $ENV{HTTP_p +roxy_pass}) if defined $ENV{HTTP_proxy}; my $response = $ua->request($request); if ($response && $response->is_success) { print "Success!\n", $response->content; } else { print "Failed!\n", $response->as_string; }


      holli, /regexed monk/
        Hmm, apparently not.
        Failed! 500 (Internal Server Error) Can't connect to kompas.com:80 (Bad protoc +ol 'tcp') Content-Type: text/plain Client-Date: Tue, 17 May 2005 14:09:43 GMT Client-Warning: Internal response 500 Can't connect to kompas.com:80 (Bad protocol 'tcp')
        "We shall peck them to death tomorrow, my dear."
      I'm pretty sure I'm not behind a proxy. The only thing of that nature I can think of that might be causing problems is a router, but I reall doubt that's it.


      "We shall peck them to death tomorrow, my dear."
        Umm, I can not reproduce the problem at my end, everything worked perfectly. I'll leave it to more enlightened monks to solve this problem.

Re: Problems with LWP
by radiantmatrix (Parson) on May 17, 2005 at 15:10 UTC

    It seems like a networking error to me. Are you behind a proxy, perchance? The Bad protocol 'tcp' error is telling you that something can't use the TCP protocol; the most common cause of this, if other things work, is that you haven't made LWP aware of your proxy settings.

    On windows, IIRC, LWP should respect the environment variables http_proxy (in the form http://proxy-server.tld:2222, where 2222 is replaced by your proxy port), http_proxy_user. and http_proxy_pass.


    The Eightfold Path: 'use warnings;', 'use strict;', 'use diagnostics;', perltidy, CGI or CGI::Simple, try the CPAN first, big modules and small scripts, test first.

Re: Problems with LWP
by bmann (Priest) on May 17, 2005 at 16:56 UTC
      Update: Problem is completely solved. It turns out that the problem required both reinstalling TCP/IP and doing the user-agent thing. Yeesh.

      "We shall peck them to death tomorrow, my dear."
Re: Problems with LWP
by TheStudent (Scribe) on May 17, 2005 at 12:28 UTC
    Hmmm, runs fine for me on ActivePerl on Win XP.
    Do you have a proxy you need to go thru?
    Can you browse the URL in a browser?