in reply to Thread local variables in Thread::Pool::Simple

I don't see how to create a thread local variable using Thread::Pool::Simple.

The best solution I've come up with so far is to create a shared hash

Hint: If you want a thread-local variable, don't start by taking a thread-local variable and making it shared.

All variables are thread-local unless you explicitly share them.

my $dbh; sub init { $dbh = DBI->connect(...); ... } my $pool = Thread::Pool::Simple->new( ... init => [ \&init ], ... );

Replies are listed 'Best First'.
Re^2: Thread local variables in Thread::Pool::Simple
by BrowserUk (Patriarch) on Sep 17, 2009 at 21:33 UTC

    What's with the double assignment?

    $dbh = $dbh = DBI->connect(...);

    A typo, or ...?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Copy and paste bug. Fixed.
Re^2: Thread local variables in Thread::Pool::Simple
by Annirak (Novice) on Sep 18, 2009 at 16:48 UTC

    Thanks, that "handled" my problem nicely.

    Now I have a new problem, though: I'm building a file parser and I wanted to implement fairly granular multithreading by offloading individual file jobs to the worker threads. I had hoped that while one thread was waiting for a file to be read in, another would be able to parse a file that was already loaded.

    I guess that multithreading isn't the solution because the parse of my file tree used to take ~200 seconds, and with 4 worker threads, it now takes ~690 seconds. Perhaps multiprocessing would work better. I know that the process isn't disk limited because the CPU useage for the parsing process sits at ~100%.

    This leads to the next question: Is there a multi-processing equivalent to Thread::Pool::Simple, which encapsulates IPC? Or do I need to setup my own process pool manager with unix pipes/tcp streams? Perhaps this could take advantage of the multi-processor machine that this is running on (2x opteron 252).

    Thanks,
    Annirak

      Don't threads use any available CPUs? Maybe not.

      Anyway, since Thread::Pool::Simple uses threads, the simplest solution it just to add use forks; early in your program. (Before use Thread::Pool::Simple;, at least.)

      By the way, how often do you end up connecting to the database? Maybe the parent should do all the database stuff. ( Why does a parser even need a database? )

      I wonder how well profilers deal with threads. ( "Devel::NYTProf is not currently thread safe." doh! )

        The database is connected to once per file. SQLite is supposed to handle multiple connections, but I don't know if buffering is in place or how well it handles multithreading.

        [Update: The parser is used for profiling files; the results are stored in the database. I'm using an existing framework and, honestly, the database seems to make sense.]

        I had debated using a queue to bring results back to the parent thread to handle all the requests. This requires a lot more code restructuring, so I didn't follow that approach initially. Right now, with the 60% drop in performance for adding threading, it doesn't seem like it's worthwhile to pursue queuing database insertions. I suppose I could check if it's worthwhile to do db commits in the parent by doing some code profiling, but since I don't have a framework in place for that, I expect it would be more work than just trying the queue.

        Just to clarify, if I put "use forks" in before "use Thread::Pool::Simple", then I'll get multiprocessing instead of multithreading?

      I guess that multithreading isn't the solution because the parse of my file tree used to take ~200 seconds, and with 4 worker threads, it now takes ~690 seconds.

      Care to show us the code and see if we cannot improve that for you?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      As ikegami alludes to in a later post, SQLite has some issues with concurrency. There is no 'row-locking', or even 'table-locking'. Any thread that begins a write locks out all others until it is done. This would be the same in multi-thread vs multi-process.
Re^2: Thread local variables in Thread::Pool::Simple
by Anonymous Monk on Sep 19, 2009 at 07:36 UTC
    On a beginner's tangent, how would you call init() from $pool in the code above? My best guess is:

    &{@{$pool->init}[0]}

    $pool->init returns the arrayref which is dereferenced and accessed with enclosing @{} and trailing [0], returning the coderef, which is in turn dereferenced with enclosing &{}?

      You can't. The object doesn't not provide a means of accessing the value you pass to init. But if it provided an accessor, I'm with Anonymous Monk: use arrows.

      Say the accessor is named "init",

      $pool->init # Returns the array ref passed to the constructor. ->[0] # Access to the first element of the referenced array. ->(); # Call the referenced sub.