in reply to HTML::Parser problem

The problem can be seen as soon in the output index.xml if you look for the line <Number>3002240</Number>. The element

<Name>INSAVER</Name>

should be

<Name>INSAVER 150TOPP 10/13W 3023460</Name>

Just as an added note, I am now re-writing this code using HTML::LinkExtractor and am not seeing the problem any longer. Very weird though, if anyone has any idea why, it would be interesting to know!

Replies are listed 'Best First'.
Re: HTML::Parser problem
by b10m (Vicar) on Mar 29, 2004 at 09:00 UTC

    Maybe I am over simplifying this, but all those modules and all that code looks a little overkill, if the HTML file mentioned is the only file. Something quick'n'dirty like this would do too (to start with, of course), right?

    $\="\n"; print "<Products>"; open FH, "<prodnr.html"; while(<FH>) { if($_ =~ /HREF="(.*pdf)".*(\d{7}) ([^<]+)/) { print "<Product>"; print "\t<Name>$3</Name>"; print "\t<PDF>$1</PDF>"; print "\t<Number>$2</Number>"; print "</Product>"; } } close FH; print "</Products>";
    --
    b10m

    All code is usually tested, but rarely trusted.

      You're not oversimplifying since using regex is hardly simpler to me, than using HTML:Parser or its brethren ;-)

      Point well taken though!