in reply to Re^5: Async DNS with LWP
in thread Async DNS with LWP
OK,
so at this point I'm now thinking:
* LWP and Mechanize are nice toys to make a quick proof of concept of a real web crawler but in practise not useful for anything more than low bandwidth automated tasks.
* With AnyEvent::HTTP and Coro you can make a proof of concept which performs better but you're still not quite there
* In order to build a real performing parallel web crawler that makes the best use of network resources performing parallel asynchronous DNS and parallel HTTP requests then I either need to use Perl's bloated thread model and directly use Perl's UDP and TCP interface or I need to give up on Perl and go ahead and build this in C
It really seems a shame that there are so many Perl modules dedicated to crawling tasks and yet none of them really have proved up to the job of being the back end of a high performance crawler that makes best use of network resources. The fact that people have dedicated so much time to making such modules would seem to suggest that many Perl users have an interest in web crawling. I'm wondering (new to PerlMonks, please help me out here) if there's anything we can do to set up a team of Perl developers that can improve the situation and develop easy to use Perl modules that are up to the job?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: Async DNS with LWP
by BrowserUk (Patriarch) on Oct 07, 2010 at 06:59 UTC | |
by jc (Acolyte) on Oct 07, 2010 at 08:49 UTC | |
by Corion (Patriarch) on Oct 07, 2010 at 08:53 UTC | |
by BrowserUk (Patriarch) on Oct 07, 2010 at 09:56 UTC | |
by jc (Acolyte) on Oct 07, 2010 at 20:19 UTC | |
by BrowserUk (Patriarch) on Oct 07, 2010 at 23:36 UTC | |
| |
by ikegami (Patriarch) on Oct 07, 2010 at 20:32 UTC | |
by BrowserUk (Patriarch) on Oct 08, 2010 at 12:18 UTC | |
by BrowserUk (Patriarch) on Oct 07, 2010 at 23:33 UTC | |
by ikegami (Patriarch) on Oct 08, 2010 at 06:03 UTC | |
|
Re^7: Async DNS with LWP
by rcaputo (Chaplain) on Oct 07, 2010 at 04:41 UTC |