I think perhaps since this is fetching data from dynamically generated web pages, I am intermittently getting some bad data. I capture all the data to local file before processing. This way, if I encounter the problem I should be able to reproduce it and then debug it. Ok great, except the same data doesn't produce the same results. Getting desperate, I decide to upgrade perl from 5.10.0 to 5.10.1 as well as all the modules I am using. Same problem. WTF I say again.
I start sprinkling my code with a whole bunch of print statements (poor man's debugger). I determine that a particular sub is entered but is not left. Ok, so what is going on in this sub - very short, looks straight forward and then I spot it:
my @clean = map clean_html($_), @dirty;
I look at clean_html() and it too looks harmless at first:
sub clean_html { my ($html) = @_; state $hs = HTML::Strip->new(); my $clean = $hs->parse($html); $clean =~ s/^\s+|\s+$//g; return $clean; }
I am wondering if this is a 5.10.x bug using state so I change it to my and the problems go away. Yay, isn't everyone going to be happy I helped find a bug. Let me go gather the information on HTML::Strip so I can include it in the bug report. Wait, what's this?
HTML::Strip maintains state between calls, so you can parse a document in chunks should you wish. If one chunk ends half-way through a tag, quote, comment, or whatever; it will remember this, and expect the next call to parse to start with the remains of said tag.
If this is not going to be the case, be sure to call $hs->eof() between calls to $hs->parse().
D'oh! The fix was simple - just add $hs->eof() in clean_html() and I could re-instate my state optimization. The reason it wasn't consistently reproduceable even with the same data is because I was populating @dirty from a hash and it wasn't always coming out in the same order. I spent far too much time debugging this simple problem that would have not existed if I had RTFM'd better. Any such stories you care to share?
Cheers - L~R
In reply to Another Reason RTFMing Is A Good Thing by Limbic~Region
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |