in reply to document clustering via link contexts

The approach that comes to mind would be doing some sort of LSI / vector space search on words in the surrounding text and relating the URLs using that. Maybe this perl.com article and the references it gives will be of help.

  • Comment on Re: document clustering via link contexts