Multithreading and DBI

knsridhar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Multithreading and DBI by ikegami (Patriarch) on Sep 21, 2005 at 15:15 UTC
I doubt it will take 10 minutes if you place the requests simultaneously. The server is the bottleneck, and the same amount of work still needs to be done by the server. You might want to look into optimizing your queries. If you want to try it out, the simplest way would be to write your program such that it accepts the .sql file name as an argument, then you can do: `# unix myscript.pl file1.sql & myscript.pl file2.sql & myscript.pl file3.sql & myscript.pl file4.sql & myscript.pl file5.sql & myscript.pl file6.sql &` [download] `# Windows start perl myscript.pl file1.sql start perl myscript.pl file2.sql start perl myscript.pl file3.sql start perl myscript.pl file4.sql start perl myscript.pl file5.sql start perl myscript.pl file6.sql` [download]	[reply] [d/l] [select]
Re^2: Multithreading and DBI by knsridhar (Scribe) on Sep 21, 2005 at 15:24 UTC
I want something like this. Hope this is clean. \| \| MAIN PROCESS \| V ------------------------------------------------- \| \| \| \| \| \| \| \| ----- ----- ----- ----- \| 1 \| \| 2 \| ....................... \| 5 \| \| 6 \| ----- ----- ----- ----- Thread Thread Thread Thread One Two Five Six	[reply]
Re^3: Multithreading and DBI by ikegami (Patriarch) on Sep 21, 2005 at 15:34 UTC
I know. I suggested something which is faster to code, less error prone due to the smaller code size, much more maintainable, and more reliable given the state of threads in Perl.	[reply]
Re^3: Multithreading and DBI by zentara (Cardinal) on Sep 22, 2005 at 10:23 UTC
Don't use threads unless you need to share data between threads. Fork and exec is simpler and faster for data independent processes. I'm not really a human, but I play one on earth. flash japh	[reply]
Re: Multithreading and DBI by InfiniteSilence (Curate) on Sep 21, 2005 at 15:21 UTC
This isn't a Perl-specific answer but have you tried to optimize your queries? For instance, have you: Looked at your query code and removed any functions in the where clause (they have a tendency to cause full table scans) Run EXPLAIN PLAN to find out how, exactly, your queries, are running Considered CONSOLIDATING these six queries into one or more stored procedures. The fact that you are running these one after another jointly seems to indicate that they MAY work better if you consolidated them into a single stored procedure. Before you look into multithreading, I strongly suggest you take a very close look at the load you are putting on the Oracle server first. Celebrate Intellectual Diversity	[reply]
Re^2: Multithreading and DBI by knsridhar (Scribe) on Sep 21, 2005 at 15:26 UTC
Yes we have done with maximum query optimization	[reply]
Re: Multithreading and DBI by perrin (Chancellor) on Sep 21, 2005 at 16:00 UTC
Use forking rather than threads. Collect the results in files. Read the files at the end. Parallel::ForkManager can help.	[reply]
Re: Multithreading and DBI by BrowserUk (Patriarch) on Sep 21, 2005 at 16:46 UTC
Does your Oracle database run on a multi-processor machine? If not, trying to overlap your queries will not result in a performance gain, rather, you would most likely substantially slow the overall throughput. Are the interface libraries to your database certified for multi-threaded use? That is, are the libraries supplied by Oracle that DBI will call to interface to the DBM certified for multi-threaded use? Many such DBM interface libraries internally use the process id of the caller to key intermediate storage. If multiple threads from the same process start calling with different, overlapping queries, the internal storage can become corrupted. Neither of these are limitation of Perl, just facts of life you will have to verify. If, having verified that your RDBMS and interface libraries are capable of supporting multi-threaded calling code, and that your DB server is such that it is able to overlap multiple concurrent queries and produce a performace increase, then you will have to verify that the DBD driver that you connect through with DBI is capable of multi-threaded use. Finally, DBI itself is not certified as thread-safe, so if you get problems, you'll have noone to turn to for help. Overall, it seems to me that listening to those offering "known-to-work" alternatives, rather than being less than gracious of their attempts to help, might save you a lot of heartache. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re: Multithreading and DBI by jZed (Prior) on Sep 21, 2005 at 15:17 UTC
I have no idea if you can gain the kind of time you want, but check out Dean Arnold's new DBIx::Threaded.	[reply]
Re: Multithreading and DBI by CountZero (Bishop) on Sep 21, 2005 at 19:05 UTC
Did you check the load on the database server? If it is high, adding multiple concurrent queries will probably slow it down even more. Anyhow you should first answer the question where the real speed-bottleneck is. If you run a huge lot of queries one after another from your .sql file with lots of data being passed back and forth, it could be that the network is the limiting factor. Or it could be that the machine running your Perl-script is having memory-issues and has to swap memory to keep everything running (or rather, crawling). You see, the limited speed can come from different causes and unless you search for the cause, it will be very much a gamble if you blindly pick one "solution" over another. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Multithreading and DBI by Ace128 (Hermit) on Sep 21, 2005 at 17:12 UTC
I was just wondering... having each .sql as a seperate thread... wouldnt that still make it slow (if not slower) compated to runing each after eachother? I mean, its just one server, and if you have multiple threads working on the db, the head on the hd has to move alot, to get the data for each thread.. ?	[reply]
Re: Multithreading and DBI by graff (Chancellor) on Sep 22, 2005 at 03:10 UTC
Do you find that the execution times of individual queries are related in a consistent way to the amount of data returned? If so, how are you actually fetching the data through DBI? (Surely you could show us a few lines of code that actually do the fetching...) Some DBI fetching methods involve more overhead than others, and the impact could be felt in proportion to the amount of data being fetched. And of course, if you're dealing with really massive amounts of data being returned from the database, well... that's going to take time no matter what -- not only for the actual transfer from oracle to your perl script, but also potentially for memory swapping of large perl data structures. When you say the longest execution time for one query is 10 minutes, what steps does that time span actually include? Just the "$dbh->execute()" step? the execute plus fetch? other stuff? You make it sound as though virtually all the execution time in the perl script is taken up by the queries, suggesting that it's all just a matter of waiting for the server to satisfy each query (whatever "satisfy" includes in your timing measurements). So I think ikegami's initial reply is probably the best thing to start with: isolate the query activity from everything else, and compare timing results between running six query-only jobs in succession vs. running those jobs simultaneously. Make it really simple: read the sql file, connect to oracle, prepare/execute/fetch and store the query results to a disk file to make sure you got it right. If, as most of us expect, the server will perform worse when trying to do multiple queries at once, then threading or forking is obviously not the way to go, and you'll need to look elsewhere to optimize (e.g. how bad would it be to take the longest/hardest query and cache the results on disk somewhere, so you don't repeat that query on every run of your "tooo long" script?)	[reply]