Re: HTML::Parser problem

The problem can be seen as soon in the output index.xml if you look for the line <Number>3002240</Number>. The element

<Name>INSAVER</Name>

should be

<Name>INSAVER 150TOPP 10/13W 3023460</Name>

Just as an added note, I am now re-writing this code using HTML::LinkExtractor and am not seeing the problem any longer. Very weird though, if anyone has any idea why, it would be interesting to know!

Comment on Re: HTML::Parser problem

Replies are listed 'Best First'.
Re: HTML::Parser problem by b10m (Vicar) on Mar 29, 2004 at 09:00 UTC
Maybe I am over simplifying this, but all those modules and all that code looks a little overkill, if the HTML file mentioned is the only file. Something quick'n'dirty like this would do too (to start with, of course), right? `$\="\n"; print "<Products>"; open FH, "<prodnr.html"; while(<FH>) { if($_ =~ /HREF="(.pdf)".(\d{7}) ([^<]+)/) { print "<Product>"; print "\t<Name>$3</Name>"; print "\t<PDF>$1</PDF>"; print "\t<Number>$2</Number>"; print "</Product>"; } } close FH; print "</Products>";` [download] -- b10m All code is usually tested, but rarely trusted.	[reply] [d/l]
Re: Re: HTML::Parser problem by Peamasii (Sexton) on Mar 29, 2004 at 09:05 UTC
You're not oversimplifying since using regex is hardly simpler to me, than using HTML:Parser or its brethren ;-) Point well taken though!	[reply]