in reply to look_up() in HTML::Element Not Traversing As Expected

If we clean up that HTML so that most of the cruft is removed we get:

<tr valign="top"> <td> <table class="imagetable"> <tr> <td> <a href="URL1"><img src="IMG1"></a> <a href="URL2">JASON FEDDY</a> </td> </tr> </table> </td> <td>Replied&nbsp;</td> <td><a class="mailtext" href="URL3">Hello mate</a></td> </tr>

Now, look at the HTML and notice that you are finding URL3, but want to navigate back to URL1. look_up can't do that. URL1 is not above URL3 in the element tree, it's in a completely different branch!

You need to sit back and think about your rules for finding URL1 given URL3. There are a bunch of ways it could be done, but it depends a great deal on how the structure of the HTML can change. I don't think it is even worth giving a solution in this particular case because you really need to accommodate possible changes in the HTML and I have no idea what those may be. In the simplest case you can just navigate up using parent, then work your way back down indexing into the contents array. But that is mighty fragile!


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: look_up() in HTML::Element Not Traversing As Expected
by initself (Monk) on Apr 29, 2006 at 03:54 UTC
    I want URL2, does that change anything? How can you tell where a branch ends and when a branch begins? I traversed the entire code to get all URL elements, so at one point there were all accessable to me using look_down().

      Nope. URL2 is still "up a few, over a couple, down a couple and over a couple". It's a sibling of URL1. Originally it was in a span element, but fo illustration purposes that doesn't matter.

      I hope you see, BTW, the virtue of cleaning up the sample data to the point where we are talking about only the relevant structure and simple data? You ought do this sort of thing pretty much whenever you have a problem to solve - remove the cruft and concentrate on the real problem.

      The real problem here is that the connection between the data you are matching and the data you want is rather tenuous so you have to make sure you understand exactly what that relationship is before you can write code to implement it. Here I think it is usefull to think in terms of parent/child and sibling relationships here.


      DWIM is Perl's answer to Gödel