Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

threads: spawn early to avoid the crush.

by BrowserUk (Patriarch)
on Mar 02, 2006 at 13:14 UTC ( [id://533874]=perlmeditation: print w/replies, xml ) Need Help??

If you are using threads, do as little as possible that consumes memory in your main thread, that includes initialising data, before you spawn your threads. Here is timing and memory usage stats from two consecutive runs of a simple threaded script. The only difference between them is the relative position of two lines of code:

c:\test>junk Image Name PID Session Name Session# Mem Usag +e ========================= ====== ================ ======== =========== += tperl.exe 10172 0 64,840 +K Taken 3.278383 seconds c:\test>junk Image Name PID Session Name Session# Mem Usag +e ========================= ====== ================ ======== =========== += tperl.exe 2924 0 173,516 +K Taken 8.761321 seconds

For the first run, the code looked like this:

#! perl -slw use strict; use threads; use Time::HiRes qw[ time ]; sub simplesub { sleep 10, return 1 } my $start = time; my @threads = map{ threads->create( \&simplesub ) } 1 .. 10; my @array = 0 .. 1e5; my %hash = 1 .. 1e5; system qq[tasklist /fi "pid eq $$"]; printf "Taken %f seconds", time() - $start; $_->join for @threads;

For the second run, like this:

#! perl -slw use strict; use threads; use Time::HiRes qw[ time ]; sub simplesub { sleep 10, return 1 } my $start = time; my @array = 0 .. 1e5; my %hash = 1 .. 1e5; my @threads = map{ threads->create( \&simplesub ) } 1 .. 10; system qq[tasklist /fi "pid eq $$"]; printf "Taken %f seconds", time() - $start; $_->join for @threads;

So, another secret to (somewhat) lighter threads is to ensure that you spawn your threads early in the program before you generate lots of data structures in your main thread. Everything that exists in your main threads memory at the time of spawn, (including everything created by all the packages you have used ( physically before or after the point of spawn!)), will be cloned wholesale into the memory of each thread you spawn!

That has the downside that you don't always want to spawn your threads right at the start of your code as you often don't have everything they need at that point. That in turn, requires that you arrange for your threads to wait for the information they require, and some method of passing that information to them at some later point once it is available. And that introduces the complications of queues and shared memory and synchronisation.

What I've been looking for for a while now is a simple interface to a mechanism that allows me to spawn my threads early, with new, clean, uncloned, interpreters, in a suspended state and then 'resume' them, passing any parameters they require using a simple, clean interface.

my( $Xthread ) = threads->create( { suspended => 1 }, \&Xthread ); my( $Ythread ) = threads->create( { suspended => 1 }, \%Ythread ); ... Do other stuff that gets me the parameters for X $Xthread->resume( $arg1, $arg2 ); ... Generate/fetch/calculate args for Y $Ythread->resume( $Yarg1, $Yarg2 ); ... tum te tum my( @Yresults ) = $Ythread->join; ... my( @Xresults ) = $Xthread->join;

If anyone has suggestions for how to go about doing this?

If the threads could be 're-resumed' with different parameters that would be even better.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: threads: spawn early to avoid the crush.
by vagnerr (Prior) on Mar 02, 2006 at 13:32 UTC
    We had a similar issue in the past with a perl daemon that forked multiple working children to handle various tasks. Obviously the larger you forking process is the larger the children. His solution was to have the daemon imediatly fork off the "main" process and keep the small initial parent just for the job of forking new processes. If the main program needed another process it would ask the original parent to do it for it. As all the parent was doing was forking new processes it remained nice and small.
    He also got to call it a realy cool name. "The Motherforker!" :-)


    _____________________
    Remember that amateurs built Noah's Ark. Professionals built the Titanic.
      If your daemon was forking, what you did was probably not a good idea. Although the processes appear to be larger when forking from a larger process, most of that memory is shared by copy-on-write. This is not the case with Perl threads, which is why BrowserUK's advice is correct for them.
      If the main program needed another process it would ask the original parent to do it for it.

      Could you explain that in a bit more detail for me? I've never done much with fork, especially in Perl.

      • How does the main code inform it's parent when it is time to create another process?
      • If there can be multiple other processes that might need to be run, how does the main code tell teh parent which one to create?
      • How does the new process get it's parameters?
      • How does the main code retrieve the results from the new process?

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Here's a simple answer to interprocess communication: use pipes.
        #!/usr/bin/perl -w #use forking open to start 3 more processes unless (open X, "-|") { print 1+5; exit }; # X=1+5 unless (open Y, "-|") { print 2*3; exit }; # Y=2*3 unless (open Z, "-|") { print <X>+<Y>; exit }; # Z=X+Y print "Z = "; print <Z>."\n";
Re: threads: spawn early to avoid the crush.
by zentara (Archbishop) on Mar 02, 2006 at 15:54 UTC
    I've been doing that way in all my thread examples. :-) What I do is first setup the shared variables, then IMMEDIATELY creating the threads and put them in a sleep loop, waiting for a signal(thru shared vars) to wake up and start running. I usually use a 1 second loop in the threads, which may seem sluggish, but you could reduce it to milliseconds if desired.

    Even in the latest Gtk2 code, which allows some fancier thread work, thru their thread-safety mechanism, experts like muppet still say the best way is to do it like you suggest. Create the threads first, before anything else is declared, and you will have few problems.

    This is the basic thread I use, you can either hard code the threads code, or pass it via shared-variable and eval it. When the thread is created, it goes right to sleep, and wakes up once per second to see if it needs to awake. The one drawback with this method, is you need to clean them up when exiting......wake them up, and tell them to die, then join them.

    sub work{ my $dthread = shift; $|++; while(1){ if($shash{$dthread}{'die'} == 1){ goto END }; if ( $shash{$dthread}{'go'} == 1 ){ eval( system( $shash{$dthread}{'data'} ) ); foreach my $num (1..100){ $shash{$dthread}{'progress'} = $num; print "\t" x $dthread,"$dthread->$num\n"; select(undef,undef,undef, .5); if($shash{$dthread}{'go'} == 0){last} if($shash{$dthread}{'die'} == 1){ goto END }; } $shash{$dthread}{'go'} = 0; #turn off self before returning }else { sleep 1 } } END: }

    I'm not really a human, but I play one on earth. flash japh

      Yes. I've been using and describing these techniques here for a 3 years or more, but I am looking for a way to ecapsulate the messy and fiddly business of shared data, access control and the process of spawning 'clean&light' threads into a module with simple interface. I gotten close a couple of times, but there is always something that I haven't found a good way to do

      Your example code misses the point. In a nutshell, the problem is

      • how to pass a coderef + parameters + context to a pre-existing dormant thread. And how to return the thread handle from that thread to the calling code for joining and results retrieval.

        Possible interface:

        use threads::lite; my @threads = threads::lite->spawn( 10 ); ... ## Then when I know what I want a thread to do my $Xthread = pop threads; $Xthread->run( \&doX, @Xargs ); .... my @Xresults = $Xthread->join;
      • Or: How to start a thread factory thread early so that you have a clean thread and then later pass a coderef + parameters + context to that factory thread; have it spawn a new (clean) thread; and then return the handle from the new thread to the caller for subsequent joining and results retrieval.

        Possible interface:

        use threads::lite; my $threadFactory = threads::lite->genFactory; .... my( $Xthread ) = $threadsFactory->create( \&doX, @Xargs ); my( $Ythread ) = $threadsFactory->create( \&doY, @Yargs );

      Don't take any notice of the module/method names shown. I could care less whether they are camelCase() or hugely_verbose_with_under_scores()--though I have my preferences like others, and I'd prefer that they weren't Hugely_Verbose_With_Camel_Case_And_Underscores() as I've encountered occasionally.

      The crux of the matter is how to create light threads (which means early), but use them when I need them; and without having to reinvent the wheel of queues and synchronisation and all that good stuff in every program; and without cloning everything in my current thread into every thread I spawn.

      Ie. A simple interface to lightweight, 'only-clone-what-is-needed' threads.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Isn't it amazing how far we've come so far? I'm sure you've examined the latest threads-related nodes where it is discussed that the new separate threads:shared module now has the ability to bless shared refs. I'm thinking that object could be the basis of the thread object. You create a shared hash, then bless it into an object, then use that object as the basis of the thread object. Add to the blessed object the sleeping worker thread(automatically), which shares that shared hash ref, then make up the methods, etc, to pass code to be evaled, go to sleep, wake up, die, join, etc.

        I'm not much into making objects, but that would be my first attempt.

        You know more about it than me, I'm pretty content to stick with functional worker threads which I control thru a hash.


        I'm not really a human, but I play one on earth. flash japh

        I like this idea. A suggestion for the interface:

        use threads::lite; my $factory = threads::line->new( -threads => 10 ); #reserve 10 thread +s my $x_thr = $factory->create( \&doX, \@Xargs, \%optional_configs ); my $y_thr = $factory->create( \&doY, \@Yargs );

        The general ideas are

        1. Spawn a number of threads up-front, if you use more, they are spawned as needed. When the factory is created, the threads could run something like:
          sub _default_thread { my $thr_id = shift; if (defined $s_coderef[$thr_id] && ref $s_coderef[$thr_id] eq 'CODE +') { $s_coderef[$thr_id]->(@{ $s_param[$thr_id] }); $s_coderef[$thr_id] = undef; } else { sleep(1) } }
        2. Pass arguments as single ARRAYref, opening the door for per-thread configuration.

        This is just off the top of my head, so take it as such.

        <-radiant.matrix->
        A collection of thoughts and links from the minds of geeks
        The Code that can be seen is not the true Code
        I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: threads: spawn early to avoid the crush.
by Eyck (Priest) on Mar 02, 2006 at 16:06 UTC
    I've been always told NOT to do what you're advising here, if you're forking off many children, and put the same data into every one of those, you get No#Children*DataStructureSize memory consumption, if you initalise everything in master thread, then the children will have their own copy-on-write copy of the datastructure, and you get 1*DataStructureSize memory consumption.

    Although it looks like your advice, when it comes to perl and not general computing, is right:

    paranoid% perl -w junk1.pl Taken 1.603796 seconds% paranoid% perl -w junk2.pl Taken 4.308179 seconds%

    I'm still avoiding threads with perl, there's no good reason for lib authors to make their libs thread-safe, thus your perl apps will never be thread-safe, and, there is basically nothing that threading has to offer (well, headaches and longer development times, but if we wanted that, we would be programming java)
    (But we do get a lot of people who read a book about GUIs, and they can't seem to live without threads these days)


    A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox

      Although it looks like your advice, when it comes to perl and not general computing, is right:

      Without wishing to offend you, this is a Perl forum, and the subject is Perl threads. Ithreads are not forked processes; not pthreads; nor greeen threads; nor any other flavour. COW is not available everywhere, and Ithreads do not (yet) make use of COW anywhere that I am aware of. As such, your prior experience is of little value in a thread relating to them.

      I'm still avoiding threads with perl, there's no good reason for lib authors to make their libs thread-safe, thus your perl apps will never be thread-safe, and, there is basically nothing that threading has to offer (well, headaches and longer development times, but if we wanted that, we would be programming java) (But we do get a lot of people who read a book about GUIs, and they can't seem to live without threads these days)

      I rarely bother to read threads about web/cgi and related technologies because they don't interest me.

      Again, without wishing to offend you, you must have seen the word "threads" in the title of post. You obviously have no interest in threads, so why bother to expend effort to respond? Especially in such a negative vein. Isn't easier to simply note the subject and move on?

      FYI. Threads have many, many uses beyond "GUIs", though they are one good use. And despite your undisguised attempts to imply that GUI applications are somehow inferior, for the vast majority of computer users, as opposed to computer technologists and geeks, gui applications are easier to use and allow them to use their computer systems as tools to perform their primary job rolls without having to become computer specialists.

      I started to try and explain the way iThreads work, and how they removed the need for the vast majority of modules to have to be coded to be thread safe. At a conservative guess, 90% of the modules on cpan work perfectly well in conjunction with threads, without any special care needing to be taken by their users, beyond not attempting to share objects across threads.

      Then I realised that, going by the tone of your post, you simply wouldn't care. You have no interest in threads and your mind is closed to their possibilities. So, I won't bother.

      A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox

      I've no idea who Alan Cox is, but it is apparent that he is just as ill informed on the subject.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      there is basically nothing that threading has to offer (well, headaches and longer development times,

      I see just the opposite. When you have a situation where you need to share data between separate processes, it is easier for me to use threads and threads::shared. I suppose if you are used to setting up safe shared memory segments for IPC, then it may be easier for you. But I still see in that situation, threads and shared data is easier to setup, and safer. I shudder when I see those shared memory segments which are not cleaned up.....I've seen some shared mem segment apps,which are supposed to clean up after themselves, leave shared memory segments intact, after a kill 9 or a control-c. I will take threads anyday. Additionally, shared mem segments work differently on win32 and unix/linux, so you need to code twice. Whereas threads work the same on win32 and unix/linux, as far as perl code is concerned.

      And there is the option of dealing with a gazillion pipes....yuck.

      But I agree with you that if you don't need to share data, forking is preferred over threads.


      I'm not really a human, but I play one on earth. flash japh

      First off, as far as forking is concerned, you're right: by doing the set-up work in the parent, it gets copied into the kids in shared copy-on-write memory. Thus, there is a huge runtime boon to doing that - both in memory and CPU.

      That said, threads are another beast. As BrowserUk points out, these are perl threads, which make them a slightly different beast than regular win32- or p- threads.

      They're different enough that I don't bother using them. However, I look forward to perl 6 partly for the hopes that by putting threading into the base language, we might get some good, lightweight threads where the types of workarounds that BrowserUk mentioned in his OP are no longer necessary. Of course, what fixing threads does to PONIE in threaded situations ... well, I don't know.

      I really wish I had a thread-safe perl where I could just do stuff in parallel and not have to worry about inter-thread communication. I have some very parallelisable tasks in my code which could really gain from this, especially when it's running on multi-CPU machines (usually 4-way machines). Unfortunately, I'm using blessed references all over the place, and the overhead probably would kill me.

      A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox

      I'm assuming this is the Alan Cox in question ? I suppose its all well and good for someone who hacks OS kernels for fun and profit to make such statements. However, as someone who has also hacked kernels (including of the realtime, SMP kind) for fun and profit, I'd adjust Mssr. Cox's assertion a bit:

      Threads are for people who can't have better things to do than program state machines.

      However, if Thread::Apartment is as capable as current testing indicates, then I'll agree with your assertion that "there's no good reason for lib authors to make their libs thread-safe". Because they won't have to, assuming they're reasonably OO Perl. Just pop them into an apartment thread, and call the methods and/or invoke its closures as needed.

      Yes I realize there are issues apartment threading can't solve. But I've managed to get some threads-hostile DBI drivers to behave, and hope to have Tk working soon, which indicates many otherwise threads hostile modules should be supportable.

Re: threads: spawn early to avoid the crush.
by acid06 (Friar) on Mar 03, 2006 at 02:17 UTC
    Have you looked at Thread::Isolate?
    It might just be what you need, since it creates a so-called "mother thread" that should hold a cleaner state of the perl interpreter.


    acid06
    perl -e "print pack('h*', 16369646), scalar reverse $="

      Yes, but any solution that uses string eval means that you lose all the compile-time checking of the code contained in the strings(*), as well as being extremely slow if you call the code more than once.

      For example, if you wish to spawn a thread to handle client connects, the time spent re-evaling the code to run in the thread, will leave your main thread unresponsive to accept new connections for too long.

      And when things go wrong in your threads, you are left with no clues as to what and where.

      It also uses Storable freeze/thaw combinations to pass data to/from/between threads. This is even slower than shared data; doesn't handle large volumes well; and makes assumptions about what the data will contain.

      I admire gmpassos greatly for the attempt, but it doesn't really work well in use.

      (*) IMO, a much better reason for avoiding string eval than "security issues".


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: threads: spawn early to avoid the crush.
by graff (Chancellor) on Mar 03, 2006 at 03:53 UTC
    It's a funny coincidence (but I'm not laughing yet), that I saw this post at the end of a day where I learned that I may need to switch from perl built without threads to perl built with threads, in order to get DBD::Oracle to work (cf. DBD::Oracle and threading on a web server).

    So your findings about runtime and memory usage led me to ponder how likely it would be that some simple-minded perl script using DBI to do stuff with data from Oracle tables might get kind of ugly, just because a simple-minded perl hacker doesn't know about, think about, or have a choice regarding the kind of coding adjustment you demonstrated in your benchmarks.

    For example, someone decides to load a lot of data from a file before connecting to the database -- and then DBD::Oracle starts doing stuff with threads "under the table", completely unbeknownst to the hapless programmer. (Maybe DBD::Oracle doesn't really do stuff with threads, but then I don't understand why it seems to need thread support...)

    Anyway, part of what you're hoping for seems unattainable, unless you accept a trade-off: you can economize on memory and runtime if you can specify up front exactly how many threads you intend to use. That's great, but isn't there a whole class of apps whose defining trait is the ability to start new threads on an as-needed basis (not knowing in advance how many will be needed)?

    I am not someone who can go into detail on this, but roughly speaking, it sounds like what is needed is a way to define some sort of initial minimal state -- like a snapshot at startup -- such that each new thread starts out with just the minimal stuff defined therein; the parent process might know of specific data that a given thread would need, and would explicitly enable the access (whether copied or shared), but without this action from the parent, the thread must simply accumulate its own data separately.

    I don't have a clue how that would be implemented (for all I know, it might already be implemented!) -- but just in conceptual terms, that seems like what you'd want.

      As I've, (I hope correctly), explained in the other thread, but I'll summaries here also to assuage any fears, threads created by call C APIs (pthreads_create()/CreateThread()/_beginThread()/other) will be completely unaffected by any memory considerations associated with Perl's cloning of Perl data.

      They would possibly be affected by the changes to process stacksize settings as I described in another recent thread--if they choose to use implicit stack size settings, but that's less common in C/C++ as the have access to the calls/parameters to use explicit values.

      Anyway, part of what you're hoping for seems unattainable, unless you accept a trade-off: you can economize on memory and runtime if you can specify up front exactly how many threads you intend to use. That's great, but isn't there a whole class of apps whose defining trait is the ability to start new threads on an as-needed basis (not knowing in advance how many will be needed)?

      Yes. That is the problem in a nutshell. Creating a "factory thread" very early in the script before anything else heavy is loaded is relatively easy to do. Even passing coderefs (which are allocated on the heap and (I believe) threadsafe), to the that thread factory so that it can spawn the new thread from a lightweight environment should be possible. The real problem comes in transferring context, and parameters, and retrieving results.

      I'm convinced it is possible, I just haven't put the right set on incantations together yet. At least, I hope that is the problem.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: threads: spawn early to avoid the crush.
by zentara (Archbishop) on Mar 03, 2006 at 20:32 UTC
    Here is my attempt, which clearly shows the memory gain problem. There are 2 things to change to see the difference. In the current scenario, there is no memory gain, because the thread is used sequentially like a queue, because I wait for it to join after each run.

    On the other hand, if you set $self->{'reuse'} = 1; and comment out the line

    my @ReturnData = $z->{'thread'}->join;

    the threads will run in parallel and the memory climbs with each thread.

    So the trick, is to find a way to have the main watch for each thread when it is ready to join, then relaunch it, instead of making another thread object. That is why it is easier with Tk, Gtk2, POE, etc, where you can have an event loop watching the thread. I am toying with the idea of how to put a self-contained method in the object to watch for the thread finishing it's code run.

    You could set the thread to be non-reuse and detach it. Then have the Zthread object store the return value in it's object. Then the main program would just have to wait an amount of time, and get the thread returns out of the Zthread object, and undef the object. That will be my next step.


    I'm not really a human, but I play one on earth. flash japh
Re: threads: spawn early to avoid the crush.
by zentara (Archbishop) on Mar 03, 2006 at 15:45 UTC
    After reading all the replies, it dawned on me that no one really mentioned the problem of the memory gains that can occur if you use "disposable threads in perl". I've tried a few times to set up threads that I try to undef after joining, or after they have otherwise finished. I found that you need to reuse the thread itself, else memory use will start creeping up. That definitely is something to consider in your "thread-launcher idea". I don't know if it can be done. Threads are like Tk objects, they need to be reused, and don't work well in the create-destroy cycle.

    I'm not really a human, but I play one on earth. flash japh

      Yeah. I wish I understood, or one of the guys that know would tell us, where the memory growth actually arises.

      If you run this (having substituted a suitable mem routine for your platform), and then play with the various values, it's really difficult to devine where the growth occurs and what controls how much?

      #! perl -slw use strict; use Data::Dumper; use threads; use threads::shared; no warnings 'misc'; our $N ||= 100; our $D ||= 1.e5; our $SHARED; sub mem { my @filler = 1 .. $D unless @_; my @filler : shared = 1 .. $D if @_; my( $usage ) = `tasklist /NH /FI \"pid eq $$\" ` =~ m[ (\S+) \s+ +K \s* $ ]x; $usage =~ tr[,][]d; return 1024 * $usage; } my @data = 1 .. $D unless $SHARED; my @data:shared = 1 .. $D if $SHARED; printf "start : %6d\n", my $start = mem; for ( 1 .. $N ) { my $thread = threads->create( \&mem ); printf "%3d : %6d\n", $_, $thread->join; } printf "end : %6d\n", my $end = mem; printf "Growth: %6d\n", $end - $start;

      Here are some typical results on my system:


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        It's probably something to do with the ref count problem which has been discussed so much before. Just like in Tk, we know it's a ref count problem, but there is no way to keep track of all the possible refs an object creates. So it all boils down to making an arbitrary rule requiring reusing the object, instead of making new ones.

        I'm not really a human, but I play one on earth. flash japh

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://533874]
Approved by xdg
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-03-29 06:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found