I recently had a frustrating debugging session in which, for a program using LWP::Simple,
my $result = get($url);
was assigning undef to result, yet
getprint($url);
was getting and printing $url's HTML.
Digging into the code, I found out why.
getprint() (and getstore() and head()) drives HTTP::Request. But ever since libwww-perl 5.15, from November 6, 1997, get() doesn't, normally. LWP::Simple rolls its own super-lightweight HTTP::Request, _trivial_http_request, and that's what get() normally uses.
And while getprint() (etc.) uses a user agent like "LWP::Simple/5.79" (the number is libwww-perl's version) and protocol HTTP 1.1, _trivial_http_request uses "lwp-trivial/1.40" (LWP::Simple's version) and protocol HTTP 1.0. So a robots.txt that allows getprint() can forbid get().
If you're using a proxy (as determined by looking for the existence of an HTTP_PROXY environment variable), get() will use HTTP::Request. If _trivial_http_request gets an HTTP redirect, it'll switch to using HTTP::Request.
Or you can import $ua, the LWP::UserAgent object LWP::Simple uses, and, as a side effect, it'll guarantee that get() always drives HTTP::Request. Remember that if you're specifying a list to import, the module's @EXPORT list won't be exported by default -- it's now incumbent upon you to include all the names you want imported.
use LWP::Simple qw($ua get);
I'm writing a doc patch to make some of this clearer; the maintainer, Gisle Aas, has verified that importing $ua is the only officially supported technique to force get() to use HTTP::Request.
Updated: linkified module names.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: LWP::Simple: a little more complicated than it sounds
by Roy Johnson (Monsignor) on Dec 06, 2004 at 16:17 UTC | |
Re: LWP::Simple: a little more complicated than it sounds
by Your Mother (Archbishop) on Dec 06, 2004 at 01:49 UTC |