"Do you know such a cpan module which can provide this functionality?"
There's no module on cpan which matches all three criteria. There are also other considerations, for example PDF files may simply by scanned images, meaning you'd have to OCR them to get the text. WWW::Mechanize::FireFox, PDF::OCR2, Super Search.
In reply to Re: web crawler infrastructure
by marto
in thread web crawler infrastructure
by david2008
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |