Re^2: Thread local variables in Thread::Pool::Simple

Thanks, that "handled" my problem nicely.

Now I have a new problem, though: I'm building a file parser and I wanted to implement fairly granular multithreading by offloading individual file jobs to the worker threads. I had hoped that while one thread was waiting for a file to be read in, another would be able to parse a file that was already loaded.

I guess that multithreading isn't the solution because the parse of my file tree used to take ~200 seconds, and with 4 worker threads, it now takes ~690 seconds. Perhaps multiprocessing would work better. I know that the process isn't disk limited because the CPU useage for the parsing process sits at ~100%.

This leads to the next question: Is there a multi-processing equivalent to Thread::Pool::Simple, which encapsulates IPC? Or do I need to setup my own process pool manager with unix pipes/tcp streams? Perhaps this could take advantage of the multi-processor machine that this is running on (2x opteron 252).

Thanks,
Annirak

Comment on Re^2: Thread local variables in Thread::Pool::Simple

Replies are listed 'Best First'.
Re^3: Thread local variables in Thread::Pool::Simple by ikegami (Patriarch) on Sep 18, 2009 at 17:04 UTC
Don't threads use any available CPUs? Maybe not. Anyway, since Thread::Pool::Simple uses threads, the simplest solution it just to add `use forks;` early in your program. (Before `use Thread::Pool::Simple;`, at least.) By the way, how often do you end up connecting to the database? Maybe the parent should do all the database stuff. ( Why does a parser even need a database? ) I wonder how well profilers deal with threads. ( "Devel::NYTProf is not currently thread safe." doh! )	[reply] [d/l] [select]
Re^4: Thread local variables in Thread::Pool::Simple by Annirak (Novice) on Sep 18, 2009 at 17:19 UTC
The database is connected to once per file. SQLite is supposed to handle multiple connections, but I don't know if buffering is in place or how well it handles multithreading. [Update: The parser is used for profiling files; the results are stored in the database. I'm using an existing framework and, honestly, the database seems to make sense.] I had debated using a queue to bring results back to the parent thread to handle all the requests. This requires a lot more code restructuring, so I didn't follow that approach initially. Right now, with the 60% drop in performance for adding threading, it doesn't seem like it's worthwhile to pursue queuing database insertions. I suppose I could check if it's worthwhile to do db commits in the parent by doing some code profiling, but since I don't have a framework in place for that, I expect it would be more work than just trying the queue. Just to clarify, if I put "use forks" in before "use Thread::Pool::Simple", then I'll get multiprocessing instead of multithreading?	[reply]
Re^5: Thread local variables in Thread::Pool::Simple by ikegami (Patriarch) on Sep 18, 2009 at 17:33 UTC
hmmm, I seem to remember SQLite have some really bad concurrency issues. As in blocking everything in addition to the first connection or transaction or something like that. Obviously, I'm not sure about this.	[reply]
Re^3: Thread local variables in Thread::Pool::Simple by BrowserUk (Patriarch) on Sep 18, 2009 at 17:13 UTC
I guess that multithreading isn't the solution because the parse of my file tree used to take ~200 seconds, and with 4 worker threads, it now takes ~690 seconds. Care to show us the code and see if we cannot improve that for you? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW It is as I've been saying!(Audio until 20090817)	[reply]
Re^3: Thread local variables in Thread::Pool::Simple by Illuminatus (Curate) on Sep 20, 2009 at 19:00 UTC
As ikegami alludes to in a later post, SQLite has some issues with concurrency. There is no 'row-locking', or even 'table-locking'. Any thread that begins a write locks out all others until it is done. This would be the same in multi-thread vs multi-process.	[reply]