Hello, I am trying to better my understanding of WWW::Mechanize. I have built a simple website of a few pages to practice traversing with WWW::Mechanize and reading html tags, attributes and content with WWW::Mechanize::TreeBuilder.

The website I built is quite simple for now, it contains a top level index.html, which contains a single table. In the table rows are a few cells, containing text and links. I am trying to read the links, follow them to the next page, gather some data, print it, then come back to the next row of the table.

Ultimately I would like to traverse a large table, and make decisions row-by-row on whether to store data from that row, and follow a link to a following page, or whether to skip that row as it doesn't meet my criteria and move on to the next one with no action taken.

I am starting with a simple test skeleton, my index.html page, with rows and links leading to a few other pages -- s1.html, s2.html, s3.html

I run into problems after leaving the current page while looping through the list of links. I would like to leave, gather/print some data, and come back and continue my loop onto the next. What actually happens is my program crashes at this point, complaining of unitialized values in /path/to/HTML/Element.pm. With all that said, here is the code I am having problems with. If I can get my page following and retreating logic nailed down properly that will be a big step for me.

use WWW::Mechanize; use WWW::Mechanize::TreeBuilder; my $mech = ... my @list = $mech->look_down(_tag => "a", class => "links"); foreach (@list) { # see if I want to skip this row, or save/print some # data and follow link to next page # printing data works fine # following a link breaks the loop $mech->get($new_url); # finds the page no problem # do stuff on page, then go back $mech->back(); # complaint is of unitialized "tag", from the look_down # call I assume? }

What I believe is happening is the program runs the main loop OK, but when it leaves the current page, something happens to @list. I don't know what, but leaving the page with $mech->get() seems to break my program.


In reply to WWW::Mechanize::TreeBuilder and WWW::Mechanize. Following links but can't return without error by mdro79

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.