krish1982 has asked for the wisdom of the Perl Monks concerning the following question:

I have tried with Perl fork manager and DBI . But i got the error DBD::mysql::st execute failed: Lost connection to MySQL server during query . Here the sample code: I want make query between low to high value (i have spitted int 10k records)
use Parallel::ForkManager; my $pm = new Parallel::ForkManager(50); my $db = krish::DB->new or die $!; # its has all connection details while ( $low < $high ) { # Some value manipulation my $pid = $pm->start and next; #db_execution returns execution while ( my $sth = db_execution ( $db, $low , $high ) ) { ... #fetch row operation ... } $pm->finish; } sub db_execution { ... my $dbh = $db->connect( 'students' ) or die $!; my $sth = $dbh->prepare( $sql ) or die "$!:" . $dbh->errstr; $sth->execute or die "$!:" . $sth->errstr; ... }
The same code is executing with out parallel processing. What is the issue? How to resolve is this?

Replies are listed 'Best First'.
Re: Parllel Processing Database Query
by psini (Deacon) on Jul 28, 2009 at 11:35 UTC

    I think that your problem is that you can't share a single DB connection between forked processes. Try opening the connection after the fork.

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: Parllel Processing Database Query
by roboticus (Chancellor) on Jul 28, 2009 at 12:19 UTC
    krish1982:

    Another problem may simply be that your request is taking too long, and the connection times out. If that's the case, your SQL may be working too hard. If so, you may want to review your table to see if it's indexed properly or similar.

    Are you sure you need the $pm->finish statement? The DBI documentation mentions that it's rarely needed. You might be masking a code bug by including it in your program.

    Notes on perl vs. databases

    When you use perl with a database, make sure you're doing the work in the correct place. The most common two problems I see are having the database retrieve a large quantity of records and then writing perl code to:

    (a) compute statistics (average, maximum/minimum, sum, etc.), or

    (b) find the records meeting a specific criteria.

    Both of these force the database to scan through a large number of records and transport them to the perl process. They also force the perl process to read the large number of records and perform the selections and processing.

    The primary problem with both is that you're having to spend a lot of resources to transport a large amount of data that you're going to discard anyway. Remember that to send a record from the database to the perl process, the computers involved must:

    • DB server must locate the records
    • DB server must format the record
    • DB server must issue I/O requests to the OS to send the data
    • perl process must receive each record
    • perl process must decode the record into internal form

    Since SQL gives you a lot of flexibility in computing statistics and selecting records, you will reduce the burden on both the DB server and your perl process if you use SQL to compute statistics and select records. So if you're unfamiliar with SQL, spend a little time learning some more about it. Alternatively, describe what data you're trying to get to your DBA and ask him how to do it in SQL. You may be surprised to see how simple it can be.

    That said, I'm not sure you're doing the work in the wrong location, but I thought I'd mention it, just in case. But in most cases where I see people trying to speed up a program with perl and a database, they're doing the work in the wrong location. I find it uncommon to see speed problems when the tasks are divided up correctly.

    ...roboticus

    Update: Added the italicised text to make the sentence mean something...

      But in most cases where I see people trying to speed up a program with perl and a database, they're doing the work in the wrong location. I find it uncommon to see speed problems when the tasks are divided up correctly.

      BTW, the latest versions of PostgreSQL include support for Perl inside the database with PL/Perl, so you can sometimes get the best of both worlds.

      Just a related comment. When using ETL tools (e.g. Informatica, Ab Initio, Teradata, Talend, etc), the best practice is to do your joins, computations in those tools because you get the benefit of massively parallel processing.

      The exception would be if you have a parallel database engine like Netezza.

Re: Parllel Processing Database Query
by dont_you (Hermit) on Jul 29, 2009 at 06:58 UTC
    I've seen that error before when using DBI and Parallel::ForkManager. The problem arises when a $dbh is inherited from the parent, and several childs try to use the same connection at the same time. I solved this as follow:
    my $fork_manager = new Parallel::ForkManager(10); $fork_manager->start and next; # fork # Clone parent dbh my $dbh_child = $dbh->clone(); $dbh->{InactiveDestroy} = 1; $dbh = $dbh_child;
    So you can try:
    sub db_execution { ... my $dbh = $db->connect( 'students' ) or die $!; my $dbh_child = $dbh->clone(); $dbh->{InactiveDestroy} = 1; $dbh = $dbh_child; my $sth = $dbh->prepare( $sql ) or die "$!:" . $dbh->errstr; $sth->execute or die "$!:" . $sth->errstr; ... }
    Hope this helps