in reply to Help getting text from website using www mechanize

Everything works except for the text dump which gives an error which I can't see because it's going too fast, how do I stop that thing anyway?

How do you know its an error if you can't see it? What goes by too fast?

perl foo.pl 1>stdout.txt 2>stderr.txt

Replies are listed 'Best First'.
Re^2: Help getting text from website using www mechanize
by Kesarion (Initiate) on Jan 28, 2011 at 04:26 UTC

    I'm just guessing since I can see the message for a split second. Thanks for the hint, stderr.txt now contains:

    Can't locate HTML/TreeBuilder.pm in @INC (@INC contains: C:/strawberry/perl/site/lib C:/strawberry/perl/vendor/lib C:/strawberry/perl/lib .) at C:/strawberry/perl/site/lib/WWW/Mechanize.pm line 662, <STDIN> line 1.

    I'm not sure what to do..

    Oh wait, I installed HTML::TreeBuilder - it works but, all I'm getting is a return character... so much for that.

    Does anyone know how to get text off a website using mechanize ?

      When I write these web automation things, the first step is to be able to get the HTML of the page I want. You can save the resulting HTML from LWP/Mechanize as a file and then open that file in Firefox to make sure you're getting correct stuff that is the same as when you use the browser to go there. Have you passed this hurdle yet?

      Then the question becomes: How do I get what I want out of this HTML? That is an application specific thing. If it is really easy, I just write a regex. HTML Parser is one option.

      - it works but, all I'm getting is a return character... so much for that.

      $mech->get( $uri )
      NOTE: Because :content_file causes the page contents to be stored in a file instead of the response object, some Mech functions that expect it to be there won't work as expected. Use with caution.

        Wow, you saw right through that code, thank you very much that was it.

        I've got one more question though, it seems mechanize can't see text generated through javascript, is there any way I can get it ?

        For example, here's a bit of code:

        function calculate(){ if(date_completate(0)){ $('#PolitaAddForm').ajaxSubmit(formoptions); }else{ $('#prima').html("Insufficient data."); } } var formoptions = { target: '#prima', url: '/rca/ajax/calcul' };

        Now when you first enter the form page you can see "Insufficient data" in the div with the id "prima". If you complete some of the form that changes into a number. That's what I want to get. There are other ways to get it but this would be the most straightforward one.