![]() |
|
laziness, impatience, and hubris | |
PerlMonks |
LWP::Simple: a little more complicated than it soundsby Zed_Lopez (Chaplain) |
on Dec 05, 2004 at 23:12 UTC ( #412544=perlmeditation: print w/replies, xml ) | Need Help?? |
I recently had a frustrating debugging session in which, for a program using LWP::Simple,
was assigning undef to result, yet
was getting and printing $url's HTML. Digging into the code, I found out why. getprint() (and getstore() and head()) drives HTTP::Request. But ever since libwww-perl 5.15, from November 6, 1997, get() doesn't, normally. LWP::Simple rolls its own super-lightweight HTTP::Request, _trivial_http_request, and that's what get() normally uses. And while getprint() (etc.) uses a user agent like "LWP::Simple/5.79" (the number is libwww-perl's version) and protocol HTTP 1.1, _trivial_http_request uses "lwp-trivial/1.40" (LWP::Simple's version) and protocol HTTP 1.0. So a robots.txt that allows getprint() can forbid get(). If you're using a proxy (as determined by looking for the existence of an HTTP_PROXY environment variable), get() will use HTTP::Request. If _trivial_http_request gets an HTTP redirect, it'll switch to using HTTP::Request. Or you can import $ua, the LWP::UserAgent object LWP::Simple uses, and, as a side effect, it'll guarantee that get() always drives HTTP::Request. Remember that if you're specifying a list to import, the module's @EXPORT list won't be exported by default -- it's now incumbent upon you to include all the names you want imported.
I'm writing a doc patch to make some of this clearer; the maintainer, Gisle Aas, has verified that importing $ua is the only officially supported technique to force get() to use HTTP::Request. Updated: linkified module names.
Back to
Meditations
|
|