Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to have WWW::Mechanize honor robots.txt.

It appears that since both WWW::Mechanize and LWP::RobotUA are independently derived from LWP::UserAgent. There is, therefore, no copuling between them other than to have a common SUPER class. That doesn't allow Mech to honor robots.txt in any obvious (to me) way.

I think my request would be solved if I could have WWW::Mechanize use LWP::RobotUA instead of use LWP::UserAgent.

Perhaps there's another way to get it to work.

Replies are listed 'Best First'.
Re: How to make WWW::Mechanize honor robots.txt (possibly via LWP::RobotUA)
by ikegami (Patriarch) on Nov 12, 2009 at 03:18 UTC

    WWW::Mechanize is wrong to inherit from LWP::UserAgent. It should encapsulate it. That means you need to resort to the following hack:

    use LWP::RobotUA qw( ); use WWW::Mechanize qw( ); BEGIN { package WWW::Mechanize; our @ISA; @ISA = ( 'LWP::RobotUA', ( grep $_ ne 'LWP::UserAgent' && $_ ne 'LWP::RobotUA', @ISA ), ); }

    Note that this will affect all WWW::Mechanize objects in your process.

Re: How to make WWW::Mechanize honor robots.txt (possibly via LWP::RobotUA)
by Anonymous Monk on Nov 12, 2009 at 03:27 UTC
    FWIW, IMHO, WWW::Mechanize is not a robot :) Maybe you want Gungho