Kesarion has asked for the wisdom of the Perl Monks concerning the following question:

This is what I'm trying to use and it's not working:

use WWW::Mechanize; my $fh = FileHandle->new("text.txt", "w"); my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get("xxx"); $mech->mirror( $mech->find_image(url_regex => qr/captcha/)->url_abs, " +tokei.jpg" ); $mech->get("xxx"); print "type:"; my $cap = <STDIN>; $mech->form_id( 'UserLoginForm' ); $mech->field( "data[User][username]", "xxx" ); $mech->field( "data[User][password]", "xxx" ); $mech->field( "captcha", $cap ); $mech->submit(); sleep (3); $mech->get("xxx", ':content_file' => "r.htm"); $mech->dump_text( $fh ); $fh->close;

Everything works except for the text dump which gives an error which I can't see because it's going too fast, how do I stop that thing anyway ? I'm new to perl and I can't continue this program, I searched but found no info, help please

-Edit- dump_headers works, so why doesn't this work ? I need to see the error, but I don't know how :/

Replies are listed 'Best First'.
Re: Help getting text from website using www mechanize
by Anonymous Monk on Jan 28, 2011 at 04:12 UTC
    Everything works except for the text dump which gives an error which I can't see because it's going too fast, how do I stop that thing anyway?

    How do you know its an error if you can't see it? What goes by too fast?

    perl foo.pl 1>stdout.txt 2>stderr.txt

      I'm just guessing since I can see the message for a split second. Thanks for the hint, stderr.txt now contains:

      Can't locate HTML/TreeBuilder.pm in @INC (@INC contains: C:/strawberry/perl/site/lib C:/strawberry/perl/vendor/lib C:/strawberry/perl/lib .) at C:/strawberry/perl/site/lib/WWW/Mechanize.pm line 662, <STDIN> line 1.

      I'm not sure what to do..

      Oh wait, I installed HTML::TreeBuilder - it works but, all I'm getting is a return character... so much for that.

      Does anyone know how to get text off a website using mechanize ?

        When I write these web automation things, the first step is to be able to get the HTML of the page I want. You can save the resulting HTML from LWP/Mechanize as a file and then open that file in Firefox to make sure you're getting correct stuff that is the same as when you use the browser to go there. Have you passed this hurdle yet?

        Then the question becomes: How do I get what I want out of this HTML? That is an application specific thing. If it is really easy, I just write a regex. HTML Parser is one option.

        - it works but, all I'm getting is a return character... so much for that.

        $mech->get( $uri )
        NOTE: Because :content_file causes the page contents to be stored in a file instead of the response object, some Mech functions that expect it to be there won't work as expected. Use with caution.