mkurtis has asked for the wisdom of the Perl Monks concerning the following question:

heres my code
#!/usr/bin/perl -w use LWP::RobotUA; my $ua = LWP::RobotUA->new('theusefulbot', 'bot@theusefulnet.com'); my $content = $ua->get("http://www.yahoo.com"); $ua->delay(10/600); print $content;
heres what i get when i run it
HTTP::Response=HASH(0x82111c0)
I was using robotUA as part of a larger crawler, but when that didnt work i made this script to test what it was "getting". I have just reinstalled the libwww5.76 module it came packaged in and that went through fine, but still these errors. Funny thing is that this script would have worked a week ago because i wrote the same thing to test what the module did and $content gave me the HTML not HTTP::Response=HASH(0x82111c0). What do i do?

Thanks

Replies are listed 'Best First'.
Re: RobotUA not working
by Prior Nacre V (Hermit) on Mar 07, 2004 at 01:51 UTC

    I've checked the documentation and other than the fact that you've reported libwww5.76 and CPAN shows libwww-perl-5.76, I can find no discrepancy between what you have here and what the documentation says you should have.

    LWP::RobotUA has a very similar example to what you have. You may want to increase your delay somewhat: you have 1 second; default is 1 minute; doco suggests 10 minutes is being nice! Regardless of this, the syntax and return value are as expected.

    LWP::UserAgent states the get() method returns a HTTP::Response object: exactly what you have here.

    The HTTP::Response documentation describes the methods this object can invoke. Have a read to find out about retrieving header/content/status/etc. information.

    I can't comment on what would have happened last week: you haven't provided details and I wasn't there.

    PN5

      Last week it returned HTML so that content printed out the page source, i was relying on it doing this so i could parse it for links, but that didnt happen. Thanks for the HTTP::response refferal.I try that.
      Thanks again

        There's a few things you can check. First, compare your old code with your new script. Either of the following two code fragments would result in the behaviour you're reporting. (Assuming, of course, a content-type of text/html.)

        # Chaining methods $content = $ua->get($uri)->content; print $content; # Providing content to 'print()' via method print $content->content;

        Another possibility, albeit fairly remote, is that your reinstallation of the modules overwrote some customisations in the previous installation. (Never discount even remote possibilities as swngnmonk found in his hell which is debugging and I can atest to from personal experience.)

        Lastly, I'd suggest renaming the variable $content (perhaps to $response) to avoid any confusion with the method content() and also to more accurately reflect the data it holds.

        PN5

Re: RobotUA not working
by tachyon (Chancellor) on Mar 07, 2004 at 15:16 UTC

    Definitely not directly related but neither yahoo nor google or alexa or ... like being spidered. They have some very serious anti DOS/spider firewalling happening. To spider sites like these is a challenge, although it can of course be done. For a start don't admit to being a robot, pretend to be IE or if you want to pretend to be a robot make yourself googlebot or one of the well known spiders that people welcome.

    I presume you are aware of robots.txt and what it does.

    cheers

    tachyon

      thanks Tachyon, however since the whole point of using robotUa was to follow the rules, i could have used mechanize and pretended to be a browser, but they can block you in thier htaccess file for that. Anyway googlebot isnt written in PERl and i dont think they would buy it was google bot anyway, besides some sites let googlebot index less than other bots. Thanks for the post though, if my RobotUA delay feature doesnt stop taking minutes, i might just resort to mechanize and your idea.
      Thanks
Re: RobotUA not working
by hsinclai (Deacon) on Mar 07, 2004 at 16:04 UTC
    Try:
    my $ua = LWP::RobotUA->new( 'myagentname', 'address@a.tld'); $ua->delay(4); my $response = $ua->get('http://bla/../'); print $response->content;
    $response has different parts to it, you must ask for $response->content to get the fetched data, or you get this business: Response=HASH(0x82111c0)
    $Id: .signature,v .99 Sun May 12 19:48:45 2002 hsinclai Exp $ Program terminated {7} abnormal CONTACT YOUR SUPERVISOR