I can't post all of it because it's on a classified network.

It should be perfectly possible (and legitimate) to produce a cut-down, but runnable version of your program that shows the generation of the file list, queue handling, and thread procedure(s) etc. without any of the proprietary logic being evident or discoverable. Ie. Discard the logging; change search constants to innocuous values; rename variables if they hint at the purpose or method of the code. etc.

Ask your mechanic help you diagnose the problems with your car; whilst you've left it at home in the garage and see what reaction you get.

I'll have to look to see if it's any of the modules I am using that could be a problem

Switch is problematic and deprecated. (Nothing to do with threading.)

Spreadsheet::ParseExcel is known to leak badly even in single-threaded code.

DBI is (I believe) fine for multi-threaded use these days; but historically, many of the DBD::* modules (or their underlying C libraries) were not thread-safe.

Personally, I still advocate only using DBI from a single thread within multi-threaded apps. Setup a second Q and have your processing threads queue their SQL to a single threaded dedicated to deal with the DB.

My reaction to your further description is that I would be splitting up your all-purposes thread procedure into several independent, specialist thread procedures each fed by a different queue and I would be performing the filetype selection process before queuing the names.

That would allow (for example) the .xls procedure to be the only one that requires Spreadsheet::ParseExcel, rather than loading that into every thread.

Ditto, by segregating out the DBI, DBI::ODBC and associated DBD::* modules and requireing them into a separate, standalone thread fed by a queue, you reduce the size of all the other threads and ensure that you only need a single, persistent connection to the DB; and so remove another raft of possible conflicts in the process.

By making each thread dedicated to a particular type of file processing -- and only loading that stuff required for that particular processing into that thread -- you avoid duplicating everything into every thread -- thus saving some memory. You can also then easily track down which type of thread is leaking and, if necessary, arrange for that (type of) thread to be re-started periodically.

I'd also avoid the practice of first generating a big array of files; and then dumping that array into a queue en-masse. At the very least, arrange for the main thread to feed the queue from the array at a rate that prevents the queue growing to more than is necessary to keep the threads fed with work; 2 or 3 times the number of threads is usually a good starting point.

But why not start the threads immediately and then feed the queue as the files are discovered, thus overlapping their discovery with the processing.

And if you go for my dedicated filetype threads with their own queues, then you can also do the selection and allocation at the same time.

But without seeing the code, this is all little more than educated guessing.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^3: Multithreading leading to Out of Memory error by BrowserUk
in thread Multithreading leading to Out of Memory error by joemaniaci

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.