Perl_Love has asked for the wisdom of the Perl Monks concerning the following question:

Hello,everyone! I am developping an crawler program, my process is as follow:

CentOS Linux release 7.2.1511(Core)

This is perl 5, version 20, subversion 3 (v5.20.3) built for x86_64-linux

Now I use BerkeleyDB module to store the URL queue, the script is able to work, but the efficiency is not high.Because when I use multiple script to read the database db, I found multiple script can't read at the same time . They were blocked. After completing the script will turn to a script.

There is a question: I need to store the URL queue in the file database, not the memory, what module is the non blocking, simple and efficient ?

I want to read to a URL from the database ,and this URL will deleted immediately , or lock the URL not to be read by other script at the same time, thank you!

  • Comment on The question of URL queue selection module ?

Replies are listed 'Best First'.
Re: The question of URL queue selection module ?
by beech (Parson) on Aug 12, 2016 at 23:00 UTC

    Now I use BerkeleyDB module to store the URL queue, the script is able to work, but the efficiency is not high.Because when I use multiple script to read the database db, I found multiple script can't read at the same time . They were blocked. After completing the script will turn to a script.

    Hi,

    What do you mean by that? What are you using exactly and how?

      Hi

      ...... my $env=new BerkeleyDB::Env -Home=>'/home/XXX/DB', -Flags=>DB_CREATE|DB_INIT_MPOOL || die; my $db=tie(%hash,"BerkeleyDB::Btree", -Filename=>"URL.db", -Flags=>DB_CREATE, -Env=>$env) || die; while(1){ while(my ($k,$v)=each %hash){ if($num == $coro_max){ &coro_url; } else{ ++$num; push(@pages,$k) if($k ne ''); } } } ......
      BerkeleyDB is blocking? I need Non-blocking.