cdherold has asked for the wisdom of the Perl Monks concerning the following question:
Pretty simple ... but all the sudden I've come upon a URL to which this is not working. I've cross tested the code with URLs that work and then put in this new site URL (http://wire.ap.org/APnews/center_minor.html?FRONTID=SCIENCE), but all it prints out is one big blank.$url = "http://www.whatever.com"; $body = get("$url"); print "$body";
Has anyone seen this before? Is it possible that there some security system on this page that will not allow it to be retrieved?
cdherold
|
|---|
| Replies are listed 'Best First'. | ||
|---|---|---|
|
(crazyinsomniac) Re: LWP::SIMPLE fails on certain URL
by crazyinsomniac (Prior) on Jan 27, 2002 at 15:51 UTC | ||
a simple LWP::Simple::get($url) didn't work for me either, even though I could see the page in me browser, but a LWP::Simple::getstore did, so I debugged that too I'd definetly say this is a bug in LWP::UserAgent, a ++ to anyone who takes the time and figures out where it is.
Below is sub LWP::UserAgent::request, which is where LWP::Simple::get seems to fail. But first, here is the line in the debug line in sub request which gives us the interesting error
update: I have $LWP::Simple::VERSION = 1.33; and $LWP::VERSION = 5.51;
| [reply] [d/l] [select] | |
by shotgunefx (Parson) on Jan 28, 2002 at 03:15 UTC | ||
I'll ++everything the person who fixes this writes. I spent over a month trying to figure it out. Drove me #!@#! nuts! -Lee "To be civilized is to deny one's nature." | [reply] | |
|
Re: LWP::SIMPLE fails on certain URL
by grep (Monsignor) on Jan 27, 2002 at 14:02 UTC | ||
This looks like a redirect to me. I would suggest looking at the page that it redirects to (preferably the non-javascript one) or finding out from the administrators of the site where you should be looking at. grep
| [reply] [d/l] | |
by cdherold (Monk) on Jan 27, 2002 at 14:21 UTC | ||
| [reply] | |
by grep (Monsignor) on Jan 27, 2002 at 14:29 UTC | ||
If this helps $LWP::Simple::VERSION = 1.35 Update: I ran it a couple more times and I continue to get content. grep
| [reply] [d/l] | |
by cdherold (Monk) on Jan 27, 2002 at 14:42 UTC | ||
|
Re: LWP::Simple fails on certain URL
by particle (Vicar) on Jan 27, 2002 at 19:08 UTC | ||
my script:
produces the following output:
with the default distribution from ActiveState in build 631. i downloaded and installed libwww-perl-5.63 in an alternate directory and prepended it to @INC, yeilding:
so it looks like an upgrade will do you good.
~Particle | [reply] [d/l] [select] | |
|
Re: LWP::SIMPLE fails on certain URL
by Zaxo (Archbishop) on Jan 27, 2002 at 14:06 UTC | ||
After Compline, | [reply] [d/l] | |
|
Re: LWP::SIMPLE fails on certain URL
by dws (Chancellor) on Jan 27, 2002 at 14:46 UTC | ||
Yes. It's also possible that the site is reponding to the User-Agent: header, though that seems unlikely since other using LWP have been able to fetch this page. You might give it a try, though. To set User-Agent: yourself, you'll need to use LWP::UserAgent instead of LWP.
| [reply] [d/l] [select] | |
|
Re: LWP::SIMPLE fails on certain URL
by shotgunefx (Parson) on Jan 28, 2002 at 03:03 UTC | ||
Here's a work around. Prepend a space to the Url. It won't use _trivial_get that way. (Don't ask me why I thought to try this, Zen I guess.) This drove me nuts for quite some time. Never was able to understand why it happens. Only with get, not getstore or getprint. -Lee "To be civilized is to deny one's nature." | [reply] [d/l] | |
|
Re: LWP::SIMPLE fails on certain URL
by screamingeagle (Curate) on Jan 27, 2002 at 14:20 UTC | ||
| [reply] | |
by cdherold (Monk) on Jan 27, 2002 at 14:30 UTC | ||
Is there something I'm missing here. I wouldn't think so because when i do this for other pages they all come out fine. Hmm ... still a little confused. | [reply] [d/l] | |
by blakem (Monsignor) on Jan 27, 2002 at 14:42 UTC | ||
-Blake | [reply] | |
|
It's not your fault... Or LWP's fault.
by joealba (Hermit) on Jan 28, 2002 at 08:47 UTC | ||
This referrer check is most likely causing the troubles you're having here. If you find a way to get past that, the user agent check, cookies, javascript test, and the funny frames will also make things interesting. For an example of what happens when wire.ap.org doesn't see a happy shiny user agent, use Netscape and turn off Javascript support -- then go to that url. It gives you nothing. For an example of the framing, go to projo.com and click on one of the stories in the "Top Stories from the AP" box. See the lovely frame at the top? Oooohh.. Aaaahhh.. From what I gathered after speaking with the webmaster, there's all kinds of funky stuff going on with this site to keep people from scrubbing it for news. | [reply] | |