The question of URL queue selection module ?

Perl_Love has asked for the wisdom of the Perl Monks concerning the following question:

Hello,everyone! I am developping an crawler program, my process is as follow:

CentOS Linux release 7.2.1511(Core)

This is perl 5, version 20, subversion 3 (v5.20.3) built for x86_64-linux

Now I use BerkeleyDB module to store the URL queue, the script is able to work, but the efficiency is not high.Because when I use multiple script to read the database db, I found multiple script can't read at the same time . They were blocked. After completing the script will turn to a script.

There is a question: I need to store the URL queue in the file database, not the memory, what module is the non blocking, simple and efficient ?

I want to read to a URL from the database ,and this URL will deleted immediately , or lock the URL not to be read by other script at the same time, thank you!

Comment on The question of URL queue selection module ?

Replies are listed 'Best First'.
Re: The question of URL queue selection module ? by beech (Parson) on Aug 12, 2016 at 23:00 UTC
Now I use BerkeleyDB module to store the URL queue, the script is able to work, but the efficiency is not high.Because when I use multiple script to read the database db, I found multiple script can't read at the same time . They were blocked. After completing the script will turn to a script. Hi, What do you mean by that? What are you using exactly and how?	[reply]
Re^2: The question of URL queue selection module ? by Perl_Love (Acolyte) on Aug 13, 2016 at 01:58 UTC
Hi `...... my $env=new BerkeleyDB::Env -Home=>'/home/XXX/DB', -Flags=>DB_CREATE\|DB_INIT_MPOOL \|\| die; my $db=tie(%hash,"BerkeleyDB::Btree", -Filename=>"URL.db", -Flags=>DB_CREATE, -Env=>$env) \|\| die; while(1){ while(my ($k,$v)=each %hash){ if($num == $coro_max){ &coro_url; } else{ ++$num; push(@pages,$k) if($k ne ''); } } } ......` [download] BerkeleyDB is blocking? I need Non-blocking.	[reply] [d/l]
Re^3: The question of URL queue selection module ? by beech (Parson) on Aug 13, 2016 at 09:30 UTC
Um, hi :) There is some caution against a while loop like that in https://metacpan.org/pod/BerkeleyDB#Implicit-Cursors I would try using the helper BerkeleyDB::Manager, specifically use cursor_stream() in place of current while loop, and ...	[reply]