Why would a Perl script stop with no informatives?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Why would a Perl script stop with no informatives? by ww (Archbishop) on Dec 22, 2009 at 02:23 UTC
As I read the source of the page you referenced with the tinyurl, there are 40 `<a href...</a>`s; 3 of those (17, 18 and 19) are hidden in a comment. Number 24, counting those in the comment, `<a href="http://gs.statcounter.com/press/bing-gains-another-1-perc-of-search-market">Bing Gains Another 1% of Search Market</a>` doesn't seem to have anything that would cause the effect; neither does number 24, ignoring those in the comment. In fact, IMO, the most remarkable thing about that page is that there are no obvious html, css or js errors (though I haven't looked at the linked css, nor at the linked js). its html is entirely valid to PerlTidy and the w3c validator; and it's marked for utf-8. You might assist wiser monks to help by providing the partial data where the script stops, pinpointing the character you identify as the one at which the issue arises.	[reply] [d/l] [select]
Re^2: Why would a Perl script stop with no informatives? by Anonymous Monk on Dec 22, 2009 at 05:20 UTC
First, my sincere thanks for looking at what is, in all liklihood, a perception error on my part. Since last post, I've looked into the buffering, and believe that, yes, it's stopping elsewhere than indicated due to buffering... the real clue was that it stopped WAY later when run in the debugger.... That being said, I found some coding errors that occur shortly after where the prints stopped, fixed them, and that appears to have removed the problem. I am attempting - more for the learning experience than otherwise - to take the anchors looted from each page, pass them (as a string) along with a ref to the URI::new()'d object holding the current page's URI, together as two args to a sub (called fqURL) that handles relative and short-absolute URI's and returns a fully qualified URL for the next page-fetch. called as: `my ($newURI, $newScheme, $newHost, $newPathSegs, $newExtension, $newQu +ery) = fqURL($anchor,$thisURI); # so it returns an object, a string, a string, an array of strings, an +d two more strings #` [download] My first problem was how to make the sub recognize the object passed as its second parm - I had used a convention of partial-caps to denote the object (oldURI), and initial-caps to denote the string representation of the object that I created from it (oldUri). Of course, I looked right past the mistake several times.... but - once it was found, it turned out to be just obscuring smoke! The actual problem was an anchor that had an href of '.' - a single standalone dot. This, when fed back into the Mech for a fetch, made the program refetch the same page forever, which somehow (still not clear to me) caused the script to crash - probably because it was making fqURLs of successively larger size, eventually running out of memory. So, as expected, the problem was not in the parser at all, was obscured by the buffering delays until the helpful posts here were read, and was entirely in my inadequate understanding of ref-passing in and out of subs - there actually WAS code in the sub to detect and defang the single-dot problem - it just wasn't being called, because of a capitalization-typo. With this experience in hand, I'm gonna re-write the whole dang thing to erase that caps convention, and replace it with something less easy to elide visually while debugging. At least, thanks again to prodding from my brother monks here, I got some experience with the debugger out of the whole mess. Thanks, Folks, for all the help and patience! Dick Martin Shorter, novice perl monk	[reply] [d/l]
Re: Why would a Perl script stop with no informatives? by gmargo (Hermit) on Dec 21, 2009 at 22:51 UTC
What are the 5 bad pages? I'd be really curious to see if the HTML is so bad that it chokes up the parser. You could also try the perl debugger.	[reply]
Re^2: Why would a Perl script stop with no informatives? by Anonymous Monk on Dec 22, 2009 at 01:09 UTC
only have one at my fingertips: http://tinyurl/nvzfar It seems to be pretty straightforward... and the parse completes... both with HTML::TokeParser and WWW:Mechanize. The failure occurs some fixed amount of execution after the parse is finished. Even if just stacking meaningless prints, it fails... (only tested in Mechanize) Let me say this again: after this loop: `foreach my $link (@links) { print "LINK: " . $link->url() . "\n" if ($DEBUG>=1); push(@anchors,$link->url()); } my $goodAnchors = 0; print " @ 1ANCLOOP\n" if ($DEBUG>=1); print " @ 2ANCLOOP\n" if ($DEBUG>=1); print " @ 3ANCLOOP\n" if ($DEBUG>=1); print " @ 4ANCLOOP\n" if ($DEBUG>=1); . . . print " @ 29ANCLOOP\n" if ($DEBUG>=1); print " @ 30ANCLOOP\n" if ($DEBUG>=1);` [download] it stops at "24ANCHLOOP" - 24 out of 30 meaningless print statements... BTW, I just replaced the link parsing code (used to use HTML::TokeParser) with WWW::Mechanize, and the same thing still happens, albeit at a slightly different place. Of course, the failure changes each time I add tracing prints... but is completely repeatable, down to the character it fails on in a print, if that's where it is failing.	[reply] [d/l]
Re^3: Why would a Perl script stop with no informatives? by ikegami (Patriarch) on Dec 22, 2009 at 01:51 UTC
it stops at "24ANCHLOOP" - 24 out of 30 meaningless print statements.. Not necessarily. Do you get the same result if you add `$\|=1;`? You could be suffering from buffering.	[reply] [d/l]
Re^4: Why would a Perl script stop with no informatives? by jdalbec (Deacon) on Dec 22, 2009 at 03:28 UTC
Re^5: Why would a Perl script stop with no informatives? by ikegami (Patriarch) on Dec 22, 2009 at 06:56 UTC
Re: Why would a Perl script stop with no informatives? by ikegami (Patriarch) on Dec 21, 2009 at 22:35 UTC
What's the process's exit code? (`echo $?`)	[reply] [d/l]
Re^2: Why would a Perl script stop with no informatives? by Anonymous Monk on Dec 22, 2009 at 00:49 UTC
ummmm, this is Windoze... there's probably a way to get the process exit code by putting it in a batch script, but...	[reply]
Re^3: Why would a Perl script stop with no informatives? by ikegami (Patriarch) on Dec 22, 2009 at 00:55 UTC
Oh, I got fooled by your mention of signals. Windows doesn't have signals. Seeing as I was trying to figure out which signal killed your app, ignore my request. It also preempts the other ideas I had, sorry. — Well, you could consider Ctrl-C and Ctrl-Break signals, but that's it. Windows apps use messages instead, and they aren't deadly. You can't even send one to a console app unless it creates a Window.	[reply]
Re^4: Why would a Perl script stop with no informatives? by Anonymous Monk on Dec 22, 2009 at 01:17 UTC

Thanks, Folks, for all the help and patience!