in reply to Re^2: Multithreading leading to Out of Memory error
in thread Multithreading leading to Out of Memory error
I can't post all of it because it's on a classified network.
It should be perfectly possible (and legitimate) to produce a cut-down, but runnable version of your program that shows the generation of the file list, queue handling, and thread procedure(s) etc. without any of the proprietary logic being evident or discoverable. Ie. Discard the logging; change search constants to innocuous values; rename variables if they hint at the purpose or method of the code. etc.
Ask your mechanic help you diagnose the problems with your car; whilst you've left it at home in the garage and see what reaction you get.
I'll have to look to see if it's any of the modules I am using that could be a problem
Switch is problematic and deprecated. (Nothing to do with threading.)
Spreadsheet::ParseExcel is known to leak badly even in single-threaded code.
DBI is (I believe) fine for multi-threaded use these days; but historically, many of the DBD::* modules (or their underlying C libraries) were not thread-safe.
Personally, I still advocate only using DBI from a single thread within multi-threaded apps. Setup a second Q and have your processing threads queue their SQL to a single threaded dedicated to deal with the DB.
My reaction to your further description is that I would be splitting up your all-purposes thread procedure into several independent, specialist thread procedures each fed by a different queue and I would be performing the filetype selection process before queuing the names.
That would allow (for example) the .xls procedure to be the only one that requires Spreadsheet::ParseExcel, rather than loading that into every thread.
Ditto, by segregating out the DBI, DBI::ODBC and associated DBD::* modules and requireing them into a separate, standalone thread fed by a queue, you reduce the size of all the other threads and ensure that you only need a single, persistent connection to the DB; and so remove another raft of possible conflicts in the process.
By making each thread dedicated to a particular type of file processing -- and only loading that stuff required for that particular processing into that thread -- you avoid duplicating everything into every thread -- thus saving some memory. You can also then easily track down which type of thread is leaking and, if necessary, arrange for that (type of) thread to be re-started periodically.
I'd also avoid the practice of first generating a big array of files; and then dumping that array into a queue en-masse. At the very least, arrange for the main thread to feed the queue from the array at a rate that prevents the queue growing to more than is necessary to keep the threads fed with work; 2 or 3 times the number of threads is usually a good starting point.
But why not start the threads immediately and then feed the queue as the files are discovered, thus overlapping their discovery with the processing.
And if you go for my dedicated filetype threads with their own queues, then you can also do the selection and allocation at the same time.
But without seeing the code, this is all little more than educated guessing.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Multithreading leading to Out of Memory error
by joemaniaci (Sexton) on Jun 07, 2013 at 21:30 UTC | |
by BrowserUk (Patriarch) on Jun 08, 2013 at 06:25 UTC |