Re: Sharing DBI between threads
by BrowserUk (Patriarch) on Jan 21, 2010 at 05:40 UTC
|
Hm. Even ignoring simple errors ( naming the passed $sth as $db_source, and then doing nothing with it), I'm not at all sure than your sample code makes any sense at all.
There are more fundemental coding issues: like you finish the sth and close the connection as soon as you've started the threads, which inevitably is going to be before they've had a chance to make much use of them. If it was ever going to work, you'd have to do that after you've joined the threads.
What do you expect to happen when you retrieve the results from a statement handle on multiple threads?
Assuming for a moment that DBI and MySQL had no problems with you calling from multiple threads, then there are two possibilities:
- Each thread retrieves its own copy of every row of data.
This seems unlikely as that would require DBI to track what data had been given to each thread.
- Each thread retrieves the next available row of data. Ie. Each row is only seen by one thread.
Assuming it worked at all--I don't have MySQL to try--this seems like the most likely scenario.
But then that doesn't make much sense if each thread is talking to a different server. And you would have no control over which rows which thread would get. Unless all the servers are identical, how could whatever this data is, be usable with whichever server it gets random assigned to?
Finally, if your concern is that populating the queue with all the returned data will consume to much memory, don't do it all at once. Instead, feed the queue slowly:
use DBI;
use threads;
use Thead::Queue;
my $Q = new Thread::Queue;
my $dbh = DBI->connect(...);
my $sth = $dbh->prepare("SELECT ...");
$sth->execute();
$thr1 = threads->new(\&some_sub, $Q );
$thr2 = threads->new(\&some_sub, $Q );
while( my $ref = $sth->fetchrow_hashref() ) {
$Q->enqueue( join $;, %$ref );
sleep 1 while $Q->pending > 10;
}
$Q->enqueue( (undef) x 2 );
$thr1->join; $thr2-join;
$sth->finish();
$dbh->disconnect();
sub some_sub {
my ( $Q ) = @_;
while( my %row = split $;, $Q->dequeue ) {
#...
}
}
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
BrowserUk, thank you, really helpful advice about checking $Q->pending, I think I will use this.
| [reply] |
Re: Sharing DBI between threads
by ikegami (Patriarch) on Jan 21, 2010 at 05:08 UTC
|
No. It makes no sense. Your threads would send conflicting commands (assuming they can get themselves understood) since they'd be sharing a comm channel and a session. For example, imagine if the second thread tried to start a transaction after the first one has already started one.
Have each thread create its own connection.
The reason for this is to create a web scrapper which runs several threads each using different proxy/network interface and each thread sleeps some time between requests to avoid ban.
Wait, what's the point of using threads if they're just going to sleep? Using threads in a scrapper only makes sense if you're querying different servers, but you said they're all querying one server.
Also, what's the point of using different proxies?
Update: Added bottom half
| [reply] |
|
|
| [reply] |
|
|
I found it unclear whether he's trying to write a polite spider (and failing) or he's intending to hammer the site as hard as he can without getting banned, thus the question.
| [reply] |
Re: Sharing DBI between threads
by DrHyde (Prior) on Jan 21, 2010 at 10:37 UTC
|
The author of DBI recommends against using it with a threaded perl. He thinks it's such a bad idea that when you try to build it, you get this warning:
*** You are using a perl configured with threading enabled.
*** You should be aware that using multiple threads is
*** not recommended for production environments.
| [reply] [d/l] |
|
|
Every program is a threaded program, even if it only has one thread.
If DBI works correctly in all of those, then if some of them create a second thread:
use DBI;
use threads;
async { sleep 1 while 1 }->detach;
# the rest of the script.
it'll continue to work correctly. Provided you don't use DBI within that other thread!
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
Then what technology would you suggest for project like this? Is there any interpreter that has stable production ready support for multithreading? I had a look at ruby and python. Ruby seems like a good compromise between Perl with its extremely rich syntax and Python with its oversimplified syntax. However ruby's port of WWW::Mechanize is in alpha release as well as curb (curl bindings). Also it is new and has very poor documentation in contrast with Perl. However, ruby's syntax is much more clear in contrast with Perl's, Perl has really unsafe features - for example it allows interchange of HASH references and integers without any warning even with `use strict;` - something that even C doesn't allow. Also need for using all this complex references for nested arrays/hashes seems very unclear for me as mostly PHP programmer. I couldn't use PHP for this project because it seems PHP lacks any support for threads at all. Now I see threads in other languages are also experimental.
Finally, I know that the best ways for this job would be to use Async I/O however this is much more complex solution. So, I am basically in search for technology that has something like WWW::Mechanize and good production ready support for threading.
| [reply] |
|
|
The problem is not Perl, and the problem is not thread support in Perl. Ikegami explained the real problem in Re: Sharing DBI between threads: A single database connection can not know which of the various threads sends commands, thus the database gets confused or damaged. You need a dedicated database connection for each single thread, period. (And you need it in every language that allows you to use threads and a database connection.)
DBI WARNS not to use a threaded perl in a production environment. This does not mean DBI does not work with a threaded perl. It means that there MAY BE some issues with a threaded perl. You should perhaps read that warning as a better-safe-than-sorry warning: "it will very probably work as well, but we can not guarantee that as well as for a non-threaded perl". Every recent perl for Windows (with faked fork() support) is threaded, as are most perls built by the various Linux distributions. And all of those perls run pretty well with the DBI.
Perl has really unsafe features - for example it allows interchange of HASH references and integers without any warning even with `use strict;`
This sounds pretty strange, like a misunderstanding of some essential concepts of Perl. If not, could you show us a working example, and a copy of perl -V that exhibits that problem?
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
|
|
|