Re^2: mutiple threading

Using DBI from multiple threads is (or has been) generally deemed not a good idea because some or all implementations of DBI, and/or the underlying C libraries were not thread safe.

To quote the DBI pod:

Threads and Thread Safety
Perl 5.7 and later support a new threading model called iThreads. (The old ``5.005 style'' threads are not supported by the DBI.)
In the iThreads model each thread has it's own copy of the perl interpreter. When a new thread is created the original perl interpreter is 'cloned' to create a new copy for the new thread.
If the DBI and drivers are loaded and handles created before the thread is created then it will get a cloned copy of the DBI, the drivers and the handles.
However, the internal pointer data within the handles will refer to the DBI and drivers in the original interpreter. Using those handles in the new interpreter thread is not safe, so the DBI detects this and croaks on any method call using handles that don't belong to the current thread (except for DESTROY).
Because of this (possibly temporary) restriction, newly created threads must make their own connctions to the database. Handles can't be shared across threads.
But BEWARE, some underlying database APIs (the code the DBD driver uses to talk to the database, often supplied by the database vendor) are not thread safe. If it's not thread safe, then allowing more than one thread to enter the code at the same time may cause subtle/serious problems. In some cases allowing more than one thread to enter the code, even if not at the same time, can cause problems. You have been warned.

Unless you are using a new version of DBI, and the above paragraph has been updated, you should not share db handles across threads. At best it will fail early and obviously. At worst it will run, but give you mixed up or corrupted data.

It is certainly safe to run overlapping queries concurrently from different processes, so the limitation lies within the libraries/implementation, rather than inherently with threads.

However, if the parallelised queries are against the same DB and/or tables, then there may be little benefit derivable from running them concurrently, as the RDBMS may need to serialise them at the DB end.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^2: mutiple threading

Replies are listed 'Best First'.
Re^3: mutiple threading by atemon (Chaplain) on Aug 09, 2007 at 15:51 UTC
as per the suggestions by perlCrazy, I think a possible thread safe re-write of above code can be use threads; use DBI; $SQL_1 = "SELECT * FROM x"; $SQL_2 = "SELECT * FROM y"; $SQL_3 = "SELECT * FROM z"; sub start_thread { my ($SQL) = @_; my $dbh = DBI->connect( <your connection string> ); # print('Thread started: ', SQL, "\n"); my $sth = $dbh->prepare($SQL); $sth->execute(); < Add your code here > } my $thr1 = threads->create('start_thread', $SQL_1); my $thr2 = threads->create('start_thread', $SQL_2); my $thr3 = threads->create('start_thread', $SQL_3); $thr1->join(); $thr2->join(); $thr3->join(); [download] There are three sides to any argument..... your side, my side and the right side.	[reply] [d/l]
Re^4: mutiple threading by BrowserUk (Patriarch) on Aug 09, 2007 at 16:05 UTC
But BEWARE, some underlying database APIs (the code the DBD driver uses to talk to the database, often supplied by the database vendor) are not thread safe. If it's not thread safe, then allowing more than one thread to enter the code at the same time may cause subtle/serious problems. In some cases allowing more than one thread to enter the code, even if not at the same time, can cause problems. You have been warned. Sorry to harp on, but unless the versions of DBI being used by perlcrazy and the drivers that underlie it have been explicitly declared thread safe, your example may or may not work. May or may not give silently corrupt results. DBI from one thread is safe. Multiple threads at a time may not be. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^5: mutiple threading by atemon (Chaplain) on Aug 24, 2007 at 05:01 UTC
Hi, I found the following in CPAN Making your module threadsafe Since 5.6.0, Perl has had support for a new type of threads called interpreter threads (ithreads). These threads can be used explicitly and implicitly. Ithreads work by cloning the data tree so that no data is shared between different threads. These threads can be used by using the threads module or by doing fork() on win32 (fake fork() support). When a thread is cloned all Perl data is cloned, however non-Perl data cannot be cloned automatically. Perl after 5.7.2 has support for the CLONE special subroutine. In CLONE you can do whatever you need to do, like for example handle the cloning of non-Perl data, if necessary. CLONE will be called once as a class method for every package that has it defined (or inherits it). It will be called in the context of the new thread, so all modifications are made in the new area. Currently CLONE is called with no parameters other than the invocant package name, but code should not assume that this will remain unchanged, as it is likely that in future extra parameters will be passed in to give more information about the state of cloning. If you want to CLONE all objects you will need to keep track of them per package. This is simply done using a hash and Scalar::Util::weaken(). Perl after 5.8.7 has support for the CLONE_SKIP special subroutine. Like CLONE, CLONE_SKIP is called once per package; however, it is called just before cloning starts, and in the context of the parent thread. If it returns a true value, then no objects of that class will be cloned; or rather, they will be copied as unblessed, undef values. This provides a simple mechanism for making a module threadsafe; just add sub CLONE_SKIP { 1 } at the top of the class, and DESTROY() will be now only be called once per object. Of course, if the child thread needs to make use of the objects, then a more sophisticated approach is needed. Like CLONE, CLONE_SKIP is currently called with no parameters other than the invocant package name, although that may change. Similarly, to allow for future expansion, the return value should be a single 0 or 1 value. So do you think I can make my DBI thread safe ? please let me know your thoughts. I am asking about threadsafty of DBI and not with respect to above program. Thanks in advance. --VC There are three sides to any argument..... your side, my side and the right side.	[reply]
Re^6: mutiple threading by BrowserUk (Patriarch) on Aug 24, 2007 at 05:31 UTC
Re^4: mutiple threading by Anonymous Monk on Sep 03, 2007 at 10:59 UTC
Hi, I agree with the above suggestion. if only three connections are required, above code will work fine. But, let us consider some situation when 100-1000 connection will be made, in that case only part of connection will be working fine, other DB connection will be rejected due to more connections. Please suggest some way to solve this.....	[reply]
Re^5: mutiple threading by atemon (Chaplain) on Sep 03, 2007 at 12:01 UTC
Hi, Most important thing is why do you need 100s of connections from same program to the same database? You need to re think about the architecture of the program as DB connections and their management are one among the most costly part of a program. We always try to have minimum number of connections to the database. Thats why we go for Apache::DBI - persistent connection. With this we share same database connection between more than one HTTP connections! ie we don't even have a new DB connection for every request, but we are reusing same database connection. Its always better to use a single thread (with single db connection) or a very few(<5) threads to deal with all database queries and manipulate the data in other 100s of threads as BrowserUk suggests. *In this same thread itself, one of the most experienced monk BrowserUk is asking why do we need 3 connections (Re^6: mutiple threading). Then I cant even imagine about 1000 connections* to have 100 - 1000 of connections, set the maximum number of allowed connections to your database (in your database configuration) to multiple of the actual number of connections per an instance. Also consider upgrading your systems to multiply CPUs and multiply your RAM to some values in GBs. Imagine your program has 200 connections and some 5 instance run at same time, you need 1,000 connections! (?) So re-think about the arch. The solution I suggested is valid only with all faults pointed out by BrowserUk. My solution just "an answer to the question" without considering the problems it can bring. Cheers ! --VC There are three sides to any argument..... your side, my side and the right side.	[reply]