fx has asked for the wisdom of the Perl Monks concerning the following question:

After my last post regarding threading with Perl against Oracle 9i I decided to go ahead and perform some testing. My system runs Perl 5.8.2 with ithreads compiled using GCC under Solaris 8 for SPARC.

My first test was to have each thread create its own connection the database:

#!/opt/perl-5.8.2/bin/perl use threads; use DBI; @threads; printf "Starting thread "; for ($i = 0; $i < 50; $i++) { printf "$i "; $threads[$i] = new threads(\&do_db); } printf "\n"; printf "Joining thread "; for ($i = 0; $i < 50; $i++) { printf "$i "; $thread = $threads[$i]; $thread->join; } printf "\n"; sub do_db { my $dbh = DBI->connect("dbi:Oracle:host=localhost;sid=testdb1" +, "user1", "pass1") || die $!; }

Watching the Oracle instances, I could see separate connections to the database being created as expected. All the threads are created but then the program dumps core. This may be something to do with the way I'm joining the threads at the end but that's not the big issue...

I then tried using, as the DBD::Oracle 1.14 documentation describes, the ora_dbh_share connection attribute which means that calls to DBI->connect do not necessarily return a reference to a unique database connection:

#!/opt/perl-5.8.2/bin/perl use threads; use DBI; our $orashr : shared = '' ; @threads; printf "Starting thread "; for ($i = 0; $i < 50; $i++) { printf "$i "; $threads[$i] = new threads(\&do_db); } printf "\n"; printf "Joining thread "; for ($i = 0; $i < 50; $i++) { printf "$i "; $thread = $threads[$i]; $thread->join; } printf "\n"; sub do_db { my $dbh = DBI->connect("dbi:Oracle:host=localhost;sid=testdb1" +, "user1", "pass1", { ora_dbh_share => \$orashr} ) || die $!; }

This only gets as far as creating 2-5 threads before dumping core so this can't be anything to do with how I am calling ->join as it's not even getting to that point.

The DBD::Oracle 1.14 documentation implies that threading is safe. It mentions that you need at least Perl 5.8.0 but I fulfil that requirement. The DBI 1.39 documentation mentions the use of ithreads in Perl 5.7 and later and even though it mentions that some database APIs are not thread safe, DBD::Oracle 1.14 claims that its ok.

Can anyone shed any light on what is happening here? The application I am developing is updating or deleting 30,000 records in a database of 6,000,000 or so on a daily basis (read: 6,000,000 in the main table which supplies a foreign key to 5 other tables each containing between 100,000 and 9,000,000 records). The SQL itself is rather slow acting due to the sheer size of the tables and a single threaded application blocks until the SQL is complete. I would like to fire off many SQL statements at once and as the box is a 4 x 750Mhz SPARC with 8Gb physical RAM I should be able to hit it fairly hard.

All comments welcomed.

== fx, Infinity Is Colourless

Replies are listed 'Best First'.
Re: Threading (Perl 5.8 ithreads) with Oracle 9i dumps core
by BrowserUk (Patriarch) on Dec 19, 2003 at 18:23 UTC

    Looking at your first snippet, and going on gut reaction, I think that the problem is that one or more elements of the code that is being loaded under the auspices of use DBI;, isn't thread-safe.

    That is to say, even though each of your threads is getting it's own instance of DBI (and everything that encapsulates), somewhere in that mix, the is a piece of XS or C code that is retaining state in some internal variable. When you create multiple instances within the same process, although Perl is keeping the various perl-level state separated, the internal state at the XS or C level is being shared without any form of interlocking, and it is this that is probably causing the core dump.

    Even the standard C-libraries have some standard calls which retain internal state (eg. strtok()), and unless the libraries you are using are specifically thread-enabled, allocating and accessing their internal state indexed by calling thread id, then using such calls in threaded apps is usually fatal.

    XS code for modules written prior to the existance of perl's threading will be especially vulnerable to this, and unless the code has been updated very recently to accomodate threading, DBI and its dependanceies are quite lightly to suffer this problem.

    You might be able to use liz's forks module as a drop-in replacement for threads and avoid these problems. It's definately worth a look.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

Re: Threading (Perl 5.8 ithreads) with Oracle 9i dumps core
by Joost (Canon) on Dec 19, 2003 at 18:21 UTC
    Any special reason to use threads at all? They're still not 100% stable, so if you don't need them, you could just as well fork() your program and get a couple of seperate connections to do the work.

    Using fork is probably more efficient too, if you're on some kind of Unix. You might even consider forks, which claims to be able to replace threads, though I've only played with it sofar, and not used it in any real application.

    HTH,
    Joost.

      The reason for using threads is that I have somewhere in the region of 30,000 updates to fire against a database and due to the database size, each update takes about 1 second. That's 500 minutes or 8 hours. That's far too long. In fact, I only really have a window of a couple of hours. I was hoping to fire off 10, 20 or 50 threads or something and just let them to the work.

      The other reason for threads is the ease at which I can share info (using queues).

      I will have a look at forks.

      Thanks.

Re: Threading (Perl 5.8 ithreads) with Oracle 9i dumps core
by perrin (Chancellor) on Dec 19, 2003 at 19:22 UTC
    It's probably DBD::Oracle or the OCI code it loads that is not safe. Regardless, using threads here is a bad idea. Perl threads are currently slower than forked processes, in addition to being less safe. You have nothing to gain by using threads.
Re: Threading (Perl 5.8 ithreads) with Oracle 9i dumps core
by liz (Monsignor) on Dec 19, 2003 at 21:48 UTC
    Looking at the DBD::Oracle documentation, I wonder whether the problem isn't caused by the fact that you're setting up the connection inside the thread.

    Have you tried running this:

    my $dbh = DBI->connect("dbi:Oracle:host=localhost;sid=testdb1", "user1 +", "pass1", +{ ora_dbh_share => \$orashr} ) || die $!;
    in the main thread before you start the other threads? $dbh should be cloned to each thread automatically (and probably correctly) because the ora_dbh_share attribute was set.

    Liz

      I have just tried this - the program now dumps core before starting any threads. I don't even get the Starting thread message that should appear before the first new threads is called. The core file is now only 2.5Mb whereas it used to be upwards of 10Mb but I suppose that is because there are no threads to increase the size of the core file.

      Thanks for the advice all the same.

Re: Threading (Perl 5.8 ithreads) with Oracle 9i dumps core
by richardX (Pilgrim) on Dec 21, 2003 at 09:07 UTC
    Perl may not be the whole answer.

    In Oracle if I have a lot of deletes to do in a large database, I use partitioned views to spread the work out. I also drop as many indexes as I can so that the index overhead is removed until I am done deleting and then I reindex.

    The partitioned views will help the update issue also, since the actual database will be spread across multiple disks. Also make sure that your temp space is not on the same disks as your database and indexes.

    I would also optimize my segment management and use stored procedures for the updates and deletes. You can have Perl call the stored procedures if you want.

    Overall I would tune my database side first, and then work on concurrent SQL processes.

    Richard

    There are three types of people in this world, those that can count and those that cannot. Anon

      I thought about dropping and then recreating the indices but the creation takes 60-80 minutes depeding on what else the machine is doing. I believe it takes so long as we use functional indicies.

      As you can probably tell, I am not an Oracle DBA. I will ask the DBA to look into partitioned views and segment management.

      Many thanks.

        It just goes to show that you should always think about the big picture. After I went back to the DBA and complained about the speed it was discovered that one of the foreign keys used for joining was not indexed. Adding this index speeded one of the cascaded deletes up from 3-5 seconds to "very quick".

        I now have reasonable speed from the SQL calls but my problem still lies in the fact that it can take hours to apply a "big" update which I have just been told is around 110,000 records.

        I am therefore still looking for some way to increase the performance of my code.