in reply to Threads and LWP::UserAgent question

Two things came to my mind while reading your question:
  1. Did you try LWP::Parallel? It might be just the right tool for your job
  2. When working with threads, try to keep the number of global variables minimal, that is declare all variables with my where possible. Using strict helps you to detect cases where you forgot some declarations. Global variables have to be cloned for each threads, which consumes quite some memory.

Apart from that, using use strict; use warnings; also helps avoiding mistakes.

Replies are listed 'Best First'.
Re^2: Threads and LWP::UserAgent question
by ikegami (Patriarch) on Jun 09, 2008 at 05:01 UTC

    When working with threads, try to keep the number of global variables minimal

    Why? While that's good advice in general — it's easier to make sure that local variables are in the proper state than it is for global variables — I don't see why it's important for threads specifically. I can see this being an issue for *shared* variables (since their access needs be controlled by locks), but other package, lexical, global and local variables are per-thread by default.

      Why?

      Because global variables have to copied for each created thread. Admittedly it's not such a big problem as with shared variables, but if you're careless they can cummulate to quite some size.

      One of the reasons that perl threads aren't exactly lightweight is that there are just too many global variables (think $_ $/ $\ $, $; @ARGV  @INC ...) which all have to be copied for each created thread. And it's one of the reasons why Perl 6 tries to reduced the number of global variables wherever possible, and invents context variables instead.

      Specifically @links (in the root node) looks weird to me - it's not a shared variable, but it's still a global variable (by virtue of being mentioned in sub XXX), and later declared as lexical in the main program. @queue is also global, although used only in XXX.

      If I ran my subroutines as external modules, wouldn't that make the global variable issue a non-issue? Doesn't each call from the top level live in it's own thread?
        It will make it easier to keep an overview of your variables, but a separate package won't influence the memory issues I mentioned earlier. If you don't care about memory and speed, that's fine.

        If you do care, read on.

        When you use global variables, they are created in the package scope, and copied for each created thread. If you use my inside your threaded function, the memory for that variable will only be allocated as needed, and only after spawning a new thread.

        That being said, I do think your program isn't very clear about what which variable is global and which is not, and it would benefit the readability (and ease of debug) to declare each and every variable, either with my (lexical, sometimes called "local" in other languages) or with our (global), and use strict; to enforce that.

        You also might take a look at Threads::Queue, which could be a replacement for your @queue variable.

        You're asking the wrong person. You asked in response to a post that boils down to "What issue with globals and threads?"

        Don't get me wrong. Global variables should be avoided. The solution is to properly scope your variables with my, not to make every function a module. That doesn't even make variable less global (although it does lessen the chance of a name collision).

Re^2: Threads and LWP::UserAgent question
by gatito (Novice) on Jun 09, 2008 at 15:41 UTC
    1) I saw the module and briefly read about it but have not tried to use it. I suppose it might work, but I took my route as I figured it would be more robust ( with multithreading - ha! ) once I scale up to 10+ spiders each accessing a database.

    2) Will do