note
LanX
<i> > What I'm thinking of is along the lines of an algorithm that looks for repetition in the HTML structure of the page, and then examines them for the relevant data - could be table rows, divs, paragraphs, lists - trying to be as generic as possible... </i><P>
Sounds for me like a combination of [wp://web mining] and [wp://cluster analysis]! (?)<P>
I doubt that you can find any ready to use modules combining both¹, cause this is a core technology for some big players in web business.<P>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-708738">
<p>Cheers Rolf
</div></div><P>
¹) Especially as generic as you asked
1010380
1010380