George_Sherston has asked for the wisdom of the Perl Monks concerning the following question:

Sibling monks, I humbly request guidance. I am making a news crawler, and I use LWP::Simple to get selected index pages from which I take links. But some of these pages use ASP, and I think that sometimes breaks stuff... or maybe it's something else. When I do
use LWP::Simple; my $html = get 'http://www.iran-news.com/asp/view_iran.asp?Id=INTERNAT +IONAL&id2=INTERNATIONAL00.jpg';
I get nothing. Even though cutting and pasting the link into the browser brings up a genuine page, and although
$html = get 'http://www.gulf-news.com/news/2001/1219/news_world.asp';
produces a page.

Can anybody tell me what I can do to make the first one work?

§ George Sherston

Replies are listed 'Best First'.
Re: LWP::Simple emversus/em ASP ?
by robin (Chaplain) on Dec 21, 2001 at 17:09 UTC
    The problem is that LWP::Simple uses a very simple hand-rolled HTTP client, which doesn't support cookies at all. You can tell LWP::Simple to use the full LWP implementation instead, by adding this line to your program (before the get):
    $LWP::Simple::FULL_LWP = 1;
    With that addition, your program works fine for me.
      Thanks - that helps - I would never have got there on my own. I now don't get a "Use of uninitialized value in print" warning. But aggravatingly I find that $html is empty. I'm completely puzzled by that. When you ran it, did you find that you got a page (albeit mutilated by broken links) when you printed $html? I just get a blank space - but only with that url; other urls give me a page.

      § George Sherston
        Is this running from a CGI script? I do get a page, and it's a frameset. That would look blank viewed in a web browser, because the the frames are specified using relative links.

        If you mean that the $html string is literally empty, try adding the line

        use LWP::Debug;
        before the get() call. You'll get a detailed trace of what LWP is doing, which should help you to diagnose the problem.
Re: LWP::Simple emversus/em ASP ?
by hopes (Friar) on Dec 21, 2001 at 07:12 UTC
    Maybe yo're having problems with cookies.
    The first thing an ASP sends to the browser (like the one you're coding) is a cookie (ASPIDSESSION).
    You have to catch this cookie in order to demonstrate you're able to remain Session Information.
    In some sites, if you don't catch the cookie, you don't see the page.


    Hopes
    $_=$,=q,\,@4O,,s,^$,$\,,s,s,^,b9,s, $_^=q,$\^-]!,,print
Re: LWP::Simple emversus/em ASP ?
by Steve_p (Priest) on Dec 21, 2001 at 20:47 UTC
    You may want to try the following:
    use LWP::Simple q/getprint/; my $html = getprint 'http://www.iran-news.com/asp/view_iran.asp?Id=INT +ERNATIONAL&id2=INTERNATIONAL00.jpg';
    getprint() will print out any errors that you get back. This should help with any troubleshooting.