Re: Parllel Processing Database Query

krish1982:

Another problem may simply be that your request is taking too long, and the connection times out. If that's the case, your SQL may be working too hard. If so, you may want to review your table to see if it's indexed properly or similar.

Are you sure you need the $pm->finish statement? The DBI documentation mentions that it's rarely needed. You might be masking a code bug by including it in your program.

Notes on perl vs. databases

When you use perl with a database, make sure you're doing the work in the correct place. The most common two problems I see are having the database retrieve a large quantity of records and then writing perl code to:

(a) compute statistics (average, maximum/minimum, sum, etc.), or

(b) find the records meeting a specific criteria.

Both of these force the database to scan through a large number of records and transport them to the perl process. They also force the perl process to read the large number of records and perform the selections and processing.

The primary problem with both is that you're having to spend a lot of resources to transport a large amount of data that you're going to discard anyway. Remember that to send a record from the database to the perl process, the computers involved must:

DB server must locate the records
DB server must format the record
DB server must issue I/O requests to the OS to send the data
perl process must receive each record
perl process must decode the record into internal form

Since SQL gives you a lot of flexibility in computing statistics and selecting records, you will reduce the burden on both the DB server and your perl process if you use SQL to compute statistics and select records. So if you're unfamiliar with SQL, spend a little time learning some more about it. Alternatively, describe what data you're trying to get to your DBA and ask him how to do it in SQL. You may be surprised to see how simple it can be.

That said, I'm not sure you're doing the work in the wrong location, but I thought I'd mention it, just in case. But in most cases where I see people trying to speed up a program with perl and a database, they're doing the work in the wrong location. I find it uncommon to see speed problems when the tasks are divided up correctly.

...roboticus

Update: Added the italicised text to make the sentence mean something...

Comment on Re: Parllel Processing Database Query Download Code

Replies are listed 'Best First'.
Re^2: Parllel Processing Database Query by ctilmes (Vicar) on Jul 28, 2009 at 12:55 UTC
But in most cases where I see people trying to speed up a program with perl and a database, they're doing the work in the wrong location. I find it uncommon to see speed problems when the tasks are divided up correctly. BTW, the latest versions of PostgreSQL include support for Perl inside the database with PL/Perl, so you can sometimes get the best of both worlds.	[reply]
Re^2: Parllel Processing Database Query by metaperl (Curate) on Jul 28, 2009 at 13:59 UTC
Just a related comment. When using ETL tools (e.g. Informatica, Ab Initio, Teradata, Talend, etc), the best practice is to do your joins, computations in those tools because you get the benefit of massively parallel processing. The exception would be if you have a parallel database engine like Netezza.	[reply]