Re: threads and RAM-Usage

Please ensure that you read the caveat at the bottom of this post.

As each thread is effectively a copy of everything in the main thread at the time of its creation, you should minimise the amount memory used by the main thread prior to creation.

One thing that can help is to avoid useing modules that aren't needed within your threads, by creating your thread pool with only those modules used by them loaded. Once you have created your threads, you can then require the modules only used by your main thread.

Another thing that can help (in perl scripts in general, not just threaded ones:) is to only load those parts of modules that you intend to use. One thing I notice is that you are doing use POSIX;. I couldn't see what you are using it for, but as an example, if I do an un-enhanced use POSIX; on my system, it adds about 1.7 MB to the footprint of the main thread. However, if all I want to use is (for example) strftime(), then doing use POSIX qw[strftime]; only adds around 400k, and avoiding any auto-imports by doing use POSIX (); reduces this to around 200k. By only auto-importing the stuff you actually need, or by auto-importing nothing and using the subs via their fully qualified names (eg.POSIX::strftime(...);) you can save a significant amount of memory. The thing to remember is that if you save 1 MB in your main thread by this method, then you can multiply that saving by the number of threads you create.

You could also reduce the size of your threads by creating them (as a pool) before you create variables and objects in your main thread. This involves creating a number of threads that sit in a blocked state waiting for some work to do. I favour blocking on a read from a queue created with Thread::Queue but you could use locked shared vars & signals or semaphores. I've had some success with the latter, but generally not much success using the cond_*() functions, but that is probably my fault rather than any inherent flaw.

I have just found the concept of using queues (something I am familiar with from other environment) the easiest to get right.

Having created your thread pool, you set up your listener in the main thread and then pass the connection to the next available thread to process. Slightly more complex, but done well it has several advantages, not least of which is that creating and destroying threads is a fairly costly operation (though theoretically, less so than forking a process), so using a pool of threads that get re-used by subsequent connections has a significant performance advantage.

It also has what can be seen as a limitation, in that without taking special steps to 'grow' the pool if you get more concurrent connections that you allocated threads, you risk having to refuse connections when things get busy. I don't see this as a disadvantage. The ability to set limits on the connections (threads), allows you to budget your memory etc. and avoid peak loads pushing the server into swapping etc. It also limits the ability of a DoS attack to push your server beyond its limits.

If you do decide to create and destroy them on the fly, don't detach them. If you aren't interested in the return value from the threads, it is tempting to detach them and let them die a natural death without waiting to be joined. Unfortuantely, my experimental evidence seems to show that whilst most if not all the memory from a joined thread gets return to the os, the memory for a detached thread seems to persist after the thread dies and never seems to get re-used.

Caveat: All this information is based upon my own experimentation and a little informed guesswork. I'm no 'expert' in threads, at least not perl's threads. I've just made some effort to try and work out some of the whys and where fors, in the absence of much (any?) exist 'prior art' on the subject. As I'm sure was true of other new 'perl things' in the past, it will take a while for a body of best practices to be established, and I only hope that I can contribute to this. I'm more than happy to pass on what I think I know, and to try and help answer any questions that arise from it, but on the understanding that you may well be better off joining the perl5 threads mailing list and asking your questions of the people that really know:)

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Comment on Re: threads and RAM-Usage Select or Download Code