in reply to Why would a Perl script stop with no informatives?

As I read the source of the page you referenced with the tinyurl, there are 40 <a href...</a>s; 3 of those (17, 18 and 19) are hidden in a comment.

Number 24, counting those in the comment, <a href="http://gs.statcounter.com/press/bing-gains-another-1-perc-of-search-market">Bing Gains Another 1% of Search Market</a> doesn't seem to have anything that would cause the effect; neither does number 24, ignoring those in the comment.

In fact, IMO, the most remarkable thing about that page is that there are no obvious html, css or js errors (though I haven't looked at the linked css, nor at the linked js).

  1. its html is entirely valid to PerlTidy and the w3c validator;
      and
  2. it's marked for utf-8.

You might assist wiser monks to help by providing the partial data where the script stops, pinpointing the character you identify as the one at which the issue arises.

Replies are listed 'Best First'.
Re^2: Why would a Perl script stop with no informatives?
by Anonymous Monk on Dec 22, 2009 at 05:20 UTC

    First, my sincere thanks for looking at what is, in all liklihood, a perception error on my part. Since last post, I've looked into the buffering, and believe that, yes, it's stopping elsewhere than indicated due to buffering... the real clue was that it stopped WAY later when run in the debugger....

    That being said, I found some coding errors that occur shortly after where the prints stopped, fixed them, and that appears to have removed the problem.

    I am attempting - more for the learning experience than otherwise - to take the anchors looted from each page, pass them (as a string) along with a ref to the URI::new()'d object holding the current page's URI, together as two args to a sub (called fqURL) that handles relative and short-absolute URI's and returns a fully qualified URL for the next page-fetch.

    called as:

    my ($newURI, $newScheme, $newHost, $newPathSegs, $newExtension, $newQu +ery) = fqURL($anchor,$thisURI); # so it returns an object, a string, a string, an array of strings, an +d two more strings #

    My first problem was how to make the sub recognize the object passed as its second parm - I had used a convention of partial-caps to denote the object (oldURI), and initial-caps to denote the string representation of the object that I created from it (oldUri). Of course, I looked right past the mistake several times.... but - once it was found, it turned out to be just obscuring smoke! The actual problem was an anchor that had an href of '.' - a single standalone dot. This, when fed back into the Mech for a fetch, made the program refetch the same page forever, which somehow (still not clear to me) caused the script to crash - probably because it was making fqURLs of successively larger size, eventually running out of memory.

    So, as expected, the problem was not in the parser at all, was obscured by the buffering delays until the helpful posts here were read, and was entirely in my inadequate understanding of ref-passing in and out of subs - there actually WAS code in the sub to detect and defang the single-dot problem - it just wasn't being called, because of a capitalization-typo. With this experience in hand, I'm gonna re-write the whole dang thing to erase that caps convention, and replace it with something less easy to elide visually while debugging. At least, thanks again to prodding from my brother monks here, I got some experience with the debugger out of the whole mess.

    Thanks, Folks, for all the help and patience!

    Dick Martin Shorter, novice perl monk