strat has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am just playing around a bit with perl5.8 threads, and would like to write a webserver that can handle many connections under Win2k (well, I know there are OS limits). So I came up with the following code (reduced to the essentials for test purposes):
#! /usr/bin/perl use warnings; use strict; use threads; use threads::shared; use IO::Socket::INET; use IO::Select; use POSIX; use constant SERVERPORT => 8090; use constant CRLF => "\015\012"; use vars qw($Done); $Done = 0; $| = 1; my $socket = IO::Socket::INET->new ( LocalPort => SERVERPORT, Listen => SOMAXCONN, Reuse => 1, ) or die "Error: couldn't create listening socket: $!\n"; my $in = IO::Select->new($socket); print "Listening for connections on port: ", SERVERPORT, "...\n"; while (! $Done) { next unless $in->can_read(); next unless my $conn = $socket->accept; threads->new(\&HandleConnection, $conn); } # while warn "Normal termination\n"; # ======================================================= sub HandleConnection { my $conn = shift; my $thread = threads->self; $conn->send("<html><head></head><body>"); $conn->send("<h1>Testpage</h1>"); $conn->send("</body></html>"); $conn->close(); print "Connection: ", $thread->tid, " finished------\n"; } # HandleConnection

This code works fine (although the http-handling is very dirty), but it seems to have a big memory leak. At the beginning, it uses about 5MB RAM, but with every creation of a new thread, it adds about 2-3 MB RAM (if I put a comment in front of threads->new(\&HandleConnection, $conn); the RAM usage stays constant).

Is there a possibility to free the memory used by a thread with Activestate Perl Built 805, or do I have to use a fixed pool of threads (may become difficult because I'd like to write a chat webserver, e.g. like Poor man's webchat from merlyn, but since HTTP::Deamon is not thread-safe, I'd like to build my own HTTP dealing threadsafe module. I really would like to to it with threads to get a better feeling for them; I don't want to use POE because there may be threads that run several hours for outputting data, and I don't want to use fork because of windows, as well as Coro(utines).

Best regards,
perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Replies are listed 'Best First'.
Re: threads and RAM-Usage
by BrowserUk (Patriarch) on Jun 14, 2003 at 12:26 UTC

    Please ensure that you read the caveat at the bottom of this post.

    As each thread is effectively a copy of everything in the main thread at the time of its creation, you should minimise the amount memory used by the main thread prior to creation.

    One thing that can help is to avoid useing modules that aren't needed within your threads, by creating your thread pool with only those modules used by them loaded. Once you have created your threads, you can then require the modules only used by your main thread.

    Another thing that can help (in perl scripts in general, not just threaded ones:) is to only load those parts of modules that you intend to use. One thing I notice is that you are doing use POSIX;. I couldn't see what you are using it for, but as an example, if I do an un-enhanced use POSIX; on my system, it adds about 1.7 MB to the footprint of the main thread. However, if all I want to use is (for example) strftime(), then doing use POSIX qw[strftime]; only adds around 400k, and avoiding any auto-imports by doing use POSIX (); reduces this to around 200k. By only auto-importing the stuff you actually need, or by auto-importing nothing and using the subs via their fully qualified names (eg.POSIX::strftime(...);) you can save a significant amount of memory. The thing to remember is that if you save 1 MB in your main thread by this method, then you can multiply that saving by the number of threads you create.

    You could also reduce the size of your threads by creating them (as a pool) before you create variables and objects in your main thread. This involves creating a number of threads that sit in a blocked state waiting for some work to do. I favour blocking on a read from a queue created with Thread::Queue but you could use locked shared vars & signals or semaphores. I've had some success with the latter, but generally not much success using the cond_*() functions, but that is probably my fault rather than any inherent flaw.

    I have just found the concept of using queues (something I am familiar with from other environment) the easiest to get right.

    Having created your thread pool, you set up your listener in the main thread and then pass the connection to the next available thread to process. Slightly more complex, but done well it has several advantages, not least of which is that creating and destroying threads is a fairly costly operation (though theoretically, less so than forking a process), so using a pool of threads that get re-used by subsequent connections has a significant performance advantage.

    It also has what can be seen as a limitation, in that without taking special steps to 'grow' the pool if you get more concurrent connections that you allocated threads, you risk having to refuse connections when things get busy. I don't see this as a disadvantage. The ability to set limits on the connections (threads), allows you to budget your memory etc. and avoid peak loads pushing the server into swapping etc. It also limits the ability of a DoS attack to push your server beyond its limits.

    If you do decide to create and destroy them on the fly, don't detach them. If you aren't interested in the return value from the threads, it is tempting to detach them and let them die a natural death without waiting to be joined. Unfortuantely, my experimental evidence seems to show that whilst most if not all the memory from a joined thread gets return to the os, the memory for a detached thread seems to persist after the thread dies and never seems to get re-used.

    Caveat: All this information is based upon my own experimentation and a little informed guesswork. I'm no 'expert' in threads, at least not perl's threads. I've just made some effort to try and work out some of the whys and where fors, in the absence of much (any?) exist 'prior art' on the subject. As I'm sure was true of other new 'perl things' in the past, it will take a while for a body of best practices to be established, and I only hope that I can contribute to this. I'm more than happy to pass on what I think I know, and to try and help answer any questions that arise from it, but on the understanding that you may well be better off joining the perl5 threads mailing list and asking your questions of the people that really know:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: threads and RAM-Usage
by deadkarma (Monk) on Jun 14, 2003 at 22:13 UTC
    Take a look at http://perlmonks.org/index.pl?node_id=218513

    One *huge* problem I've seen with threads on Win32, that for some reason after creating a certain amount of undetached threads (on my WinXP box it's 120) the entire process simply exits, no warning, no nothing. Anyhow, hope this helps.

      And why did you create 120 threads? What was the application?

      Show me your design that requires you to run 120 threads and I'll pretty much guarentee that I can redesign it to use less, like maybe 10. And it will be considerably more efficient and more responsive. A process with 120 threads would spend so much time context switching, it would have little or no time for processing.

      Threads can be a very effective tool for solving certain kinds of problems, but like all the best medicines, they are best used sparingly. This is doubly true of perl's ithreads implementation.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        You're right, 120 threads is quite excessive, and I can't think of a situation where >120 threads would be needed either.

        I was using the above code to write a custom SMTP server for spam prevention and we anticipated it to be heavily hit. During my benchmarks I wanted to simulate the unlikley event of a few hundred simultaneous connections to see how well it performed under this kind of pressure. The entire process was shutting down, no errors, no messages and it wasted many hours of my life trying to figure out why.

        I suppose it's not really a *huge* problem in theory, but when you don't know why something happens, it could become a huge problem.