http://qs1969.pair.com?node_id=1010409


in reply to Machine learning pattern matching...

> What I'm thinking of is along the lines of an algorithm that looks for repetition in the HTML structure of the page, and then examines them for the relevant data - could be table rows, divs, paragraphs, lists - trying to be as generic as possible...

Sounds for me like a combination of web mining and cluster analysis! (?)

I doubt that you can find any ready to use modules combining both¹, cause this is a core technology for some big players in web business.

Cheers Rolf

¹) Especially as generic as you asked