nandn has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I've read about the various limitations of DBI/DBD related to multi-threading. However, I imagine someone has found a workaround or alternative architecture. Goal: trigger database-intensive processes when certain events take place. I've tried to start a new thread when events take place and also to fork the main process. The Perl interpreter crashes. Does anyone have a solution? My configuration: Windows NT, Perl 5.8.8, DBI, DBD-mysql. Thanks.

Replies are listed 'Best First'.
Re: Multi-thread database script
by ikegami (Patriarch) on Apr 24, 2007 at 04:07 UTC

    I think you'll have better luck if you only load DBI in the child.

    my $pid = fork(); if (not defined $pid) { # Fork error ... } elsif (not $pid) { # Child process require DBI; # Not "use" to load in child only. ... } else { # Parent process ... }

    This is based on a vague memory of something I once read. I hope it works.

Re: Multi-thread database script (spawn)
by tye (Sage) on Apr 24, 2007 at 05:19 UTC

    Using Perl on Win32 I would avoid threads (Perl sucks at threads1) and avoid fork (fork is emulated by Win32 Perl and only somewhat successfully and this emulation also uses Perl threads, which suck, in case I didn't just mention that). (:

    As ikegami notes, if you do use threads (via fork or otherwise), be sure to connect to the DB from the child. Perl threads create a new instance of the Perl interpretter and simple scalar values get copied fine but things with operating system magic attached (like file handles such as network connections to a database) can't be copied as easily. And the module isn't written to expect a single connection to be copied and used by two instances at once. So connecting and then forking or creating a new Perl thread will usually not work.

    But because you are using both Perl and Win32, I'd instead spawn a DB client and feed tasks to it. That is, have either a separate Perl script or just a separate way of invoking the same Perl script and then invoke that in a child process and when an event happens, hand off the time-consuming task to it.

    This can sometimes be handled by something as simple as:

    use IO::Handle; open DBCHILD, "| $^X $0 -dbchild" or die ...; DBCHILD->autoflush( 1 ); # ... sub eventHandler { # ... print DBCHILD $eventTaskDesc, $/; }

    Then the DB child just reads STDIN for things that it should do.

    - tye        

    1 The reason good threads are nice is that they are more efficient in memory than fork, share everything implicitly (which is also why they are not for the unprepared as preventing race conditions due to all this sharing requires a significant investment in tools and techniques), and share very efficiently. Perl threads, however, are less memory efficient than regular fork, share almost nothing implicitly, and don't even share efficiently.

      As far as I could tell from quick glances, there has been a lot of recent p5p activity by jdhedden and others around threads. Would people in the know care to discuss what kinds of improvement we may expect on the threads front with 5.10, please? This may also be a good opportunity for Parrot developers to update us on the state of Parrot threads and developments since Parrot, threads & fears for the future.. Cheers.

        The problems with Perl 5 threads are not due to the developers being incompetent nor for lack of effort. Any improvements to the Perl 5 threading model are going to be small improvements because the big problems are fundamental. You can't just take a monolithic system like Perl 5 and throw threads into the design as an after-thought. And Perl 5, unluckily, is an especially good example of a system that strongly resists the addition of good threading.

        At this point, I suspect that you'd often get better performance on Unix Perl if threading was implemented using fork with IPC used for sharing variables (especially if mmap were used). That is how fundamentally unlike threads the current Perl 5 threading model has become after years of trying and failing to make Perl work with a more thread-like model.

        - tye        

Re: Multi-thread database script
by renodino (Curate) on Apr 24, 2007 at 16:05 UTC
    Assuming you're actually using threads, be advised that threading with the DBI is a bit hit or miss. It requires (a) a thread-safe database driver and (b) a threads-aware DBI driver.

    I believe DBD::MySQL is one of the few (only ?) DBD's which have implemented the internal brain surgery needed to swap things between threads, and I think your app has to do some extra work to enable it.

    Attempting to recover a database connection across a fork() (regardless the platform or API) can be a dangerous operation (as you've discovered), due to the complex interactions most database drivers have with their connection to the database. So, as ikegami pointed out, you'll need to create your connection within the child process. And in most instances, this also applies to Perl threads. Oh, BTW, on Windows, fork() is threads, but probably less stable than regular threads.

    I'll toss in DBIx::Threaded as another option. It uses apartment threading to isolate the connection in its own thread, and distributes proxies to other threads for access. Essentially, its a poor man' Actor model. However, as it currently relies on threads::shared, performance in contentious systems is less than ideal.

    As to other comments wrt threads here, I'll suggest that the more recent releases of threads and threads::shared permit reducing thread footprint, add add'l functionality, and improve stability (many thnx to jdhedden's efforts). However, ithreads still suffers from lengthy startup times and large memory footprints. I'll offer the following advice:

    • create thread pools early before you've loaded a bunch of modules
    • consider a shared nothing Actor model as much as possible
    I'll offer Thread::Apartment as a means to accomplish the latter.

    Perl Contrarian & SQL fanboy
Re: Multi-thread database script
by Joost (Canon) on Jun 04, 2007 at 20:25 UTC
    I'm a bit late to this party. :-)

    I'm not too sure about windows, but DBI / DBD::mysql on linux with threads works fine, for certain values of fine:

    You can't share handles over multiple thread. You should probably not use a handle that's in scope from the parent thread either. Every thread should have its own database handle(s), so create new connections at the start of a thread if you need them.

    You need to compile DBD::mysql with thread support. AFAIK most default builds are done without thread support even on systems that have a threaded perl.

    On my (linux) system, that means running mysql_config  --libs_r and passing the output of that to Makefile.PL prior to compilation:

    $ mysql_config --libs_r -L/usr/lib/mysql -lmysqlclient_r $ perl Makefile.PL --libs "-L/usr/lib/mysql -lmysqlclient_r"
    Note that libmysqlclient_r.so is the threaded version of the libmysqlclient library on linux, libmysqlclient.so is the default, non-threaded version.

    PS: starting a new thread in perl is slow. You might not want to do it for each event. Having a pool of threads is probably more efficient.

Re: Multi-thread database script
by Anonymous Monk on Apr 24, 2007 at 17:43 UTC
    DBI is an XS module that is not thread-safe. See the "Using non-threadsafe modules" section of the "threads" POD for information on working around this.