Re: Search Engine , Web Crawling, Data Mining question

You'd want to make use of Perl's LWP, HTTP and HTML libraries for datamining.

Using those libraries and some of the modules they include, you can make a good web-crawling program that will extract pretty much anything you want to extract(ie. links to other sites,images,etc.).

There are a few different ways to implement this stuff, so I would take the time to read the documentation on things such as:

HTTP::Request
HTTP::Response
LWP::UserAgent
LWP::Simple
HTML::Parser
URI::Heuristic

From there you can explore a couple of different methods by which to achieve your goal. Keep in mind however, that some sites take measures to prevent the use of spiders, so your options may be limited as to which direction you take(ie.,easy way or harder way).

Hope that helps.

Amel - f.k.a. - kel

Comment on Re: Search Engine , Web Crawling, Data Mining question