Help getting text from website using www mechanize

Kesarion has asked for the wisdom of the Perl Monks concerning the following question:

This is what I'm trying to use and it's not working:

use WWW::Mechanize;

my $fh = FileHandle->new("text.txt", "w");

my $mech = WWW::Mechanize->new( autocheck => 1 );

$mech->get("xxx");

$mech->mirror( $mech->find_image(url_regex => qr/captcha/)->url_abs, "
+tokei.jpg" );
$mech->get("xxx");

print "type:";
my $cap = <STDIN>;

$mech->form_id( 'UserLoginForm' );
$mech->field( "data[User][username]", "xxx" );
$mech->field( "data[User][password]", "xxx" );
$mech->field( "captcha", $cap );
$mech->submit();
sleep (3);

$mech->get("xxx", ':content_file' => "r.htm");
$mech->dump_text( $fh );
$fh->close;
[download]

Everything works except for the text dump which gives an error which I can't see because it's going too fast, how do I stop that thing anyway ? I'm new to perl and I can't continue this program, I searched but found no info, help please

-Edit- dump_headers works, so why doesn't this work ? I need to see the error, but I don't know how :/

Comment on Help getting text from website using www mechanize Download Code

Replies are listed 'Best First'.
Re: Help getting text from website using www mechanize by Anonymous Monk on Jan 28, 2011 at 04:12 UTC
Everything works except for the text dump which gives an error which I can't see because it's going too fast, how do I stop that thing anyway? How do you know its an error if you can't see it? What goes by too fast? `perl foo.pl 1>stdout.txt 2>stderr.txt` [download]	[reply] [d/l]
Re^2: Help getting text from website using www mechanize by Kesarion (Initiate) on Jan 28, 2011 at 04:26 UTC
I'm just guessing since I can see the message for a split second. Thanks for the hint, stderr.txt now contains: Can't locate HTML/TreeBuilder.pm in @INC (@INC contains: C:/strawberry/perl/site/lib C:/strawberry/perl/vendor/lib C:/strawberry/perl/lib .) at C:/strawberry/perl/site/lib/WWW/Mechanize.pm line 662, <STDIN> line 1. I'm not sure what to do.. Oh wait, I installed HTML::TreeBuilder - it works but, all I'm getting is a return character... so much for that. Does anyone know how to get text off a website using mechanize ?	[reply]
Re^3: Help getting text from website using www mechanize by Marshall (Canon) on Jan 28, 2011 at 06:26 UTC
When I write these web automation things, the first step is to be able to get the HTML of the page I want. You can save the resulting HTML from LWP/Mechanize as a file and then open that file in Firefox to make sure you're getting correct stuff that is the same as when you use the browser to go there. Have you passed this hurdle yet? Then the question becomes: How do I get what I want out of this HTML? That is an application specific thing. If it is really easy, I just write a regex. HTML Parser is one option.	[reply]
Re^3: Help getting text from website using www mechanize by Anonymous Monk on Jan 28, 2011 at 07:55 UTC
- it works but, all I'm getting is a return character... so much for that. $mech->get( $uri ) NOTE: Because :content_file causes the page contents to be stored in a file instead of the response object, some Mech functions that expect it to be there won't work as expected. Use with caution.	[reply]
Re^4: Help getting text from website using www mechanize by Kesarion (Initiate) on Jan 28, 2011 at 11:40 UTC
Re^5: Help getting text from website using www mechanize by marto (Cardinal) on Jan 28, 2011 at 12:28 UTC
Re^3: Help getting text from website using www mechanize by Anonymous Monk on Jan 28, 2011 at 07:45 UTC
I'm just guessing since I can see the message for a split second. Um, start a shell? See perlrun, Behind the GUI lives the Shell	[reply]