jalewis2 has asked for the wisdom of the Perl Monks concerning the following question:

I have a perl script that starts a server and loads 2GB of data into memory. It also listens on a TCP port and accepts clients. Clients connect to the server and are able to submit a query and get a result from the data in memory. Currently, only one client can connect at a time.

I want to make this server able to accept multiple clients at once and I am having trouble. I used the code from the Perl Cookbook for a pre-forked server and it seems to work for a little while, but after a while it takes almost all the cpu and will eventually hang the box it is running on.

First, I had thoughts that maybe there was an issue with loading 2GB of data and then having multiple clients access it. After some troubleshooting, it seems that children weren't being closed. I added code to close each client, but now it seems children aren't being restarted after they are killed.

My question is, am I going about this all wrong? Is there a better solution that preforking? There may be an issue with shared memory, but I am not entirely clear on what is happening to troubleshoot.

Here is the code from the cookbook:

#!/usr/bin/perl # preforker - server who forks first use IO::Socket; use Symbol; use POSIX; # establish SERVER socket, bind and listen. $server = IO::Socket::INET->new(LocalPort => 6969, Type => SOCK_STREAM, Proto => 'tcp', Reuse => 1, Listen => 10 ) or die "making socket: $@\n"; # global variables $PREFORK = 5; # number of children to maintain $MAX_CLIENTS_PER_CHILD = 5; # number of clients each child sho +uld process %children = (); # keys are current child process I +Ds $children = 0; # current number of children sub REAPER { # takes care of dead children $SIG{CHLD} = \&REAPER; my $pid = wait; $children --; delete $children{$pid}; } sub HUNTSMAN { # signal handler for SIGINT local($SIG{CHLD}) = 'IGNORE'; # we're going to kill our children kill 'INT' => keys %children; exit; # clean up with dignity } # Fork off our children. for (1 .. $PREFORK) { make_new_child(); } # Install signal handlers. $SIG{CHLD} = \&REAPER; $SIG{INT} = \&HUNTSMAN; # And maintain the population. while (1) { sleep; # wait for a signal (i.e., child's + death) for ($i = $children; $i < $PREFORK; $i++) { make_new_child(); # top up the child pool } } sub make_new_child { my $pid; my $sigset; # block signal for fork $sigset = POSIX::SigSet->new(SIGINT); sigprocmask(SIG_BLOCK, $sigset) or die "Can't block SIGINT for fork: $!\n"; die "fork: $!" unless defined ($pid = fork); if ($pid) { # Parent records the child's birth and returns. sigprocmask(SIG_UNBLOCK, $sigset) or die "Can't unblock SIGINT for fork: $!\n"; $children{$pid} = 1; $children++; return; } else { # Child can *not* return from this subroutine. $SIG{INT} = 'DEFAULT'; # make SIGINT kill us as it did be +fore # unblock signals sigprocmask(SIG_UNBLOCK, $sigset) or die "Can't unblock SIGINT for fork: $!\n"; # handle connections until we've reached $MAX_CLIENTS_PER_CHIL +D for ($i=0; $i < $MAX_CLIENTS_PER_CHILD; $i++) { $client = $server->accept() or last; # do something with the connection } # tidy up gracefully and finish # this exit is VERY important, otherwise the child will become # a producer of more and more children, forking yourself into # process death. exit; } }

Replies are listed 'Best First'.
Re: Handling multiple clients
by graff (Chancellor) on Sep 05, 2004 at 03:48 UTC
    I wasn't sure myself, so I just did a simple-minded test, and sure enough, when the child starts up, it takes up as much memory as the parent, which means that you're getting a full copy of your 2gb in-memory data each time you fork. Forking 5 children would pretty much guarantee that the OS will need to do a lot of memory swapping to run all those huge child processes. I think the delays you're seeing are not so much the cpu load of the children, but rather the i/o wait imposed by swapping. (Some versions of "top" will report the total percentage of processing time devoted to "i/o wait" -- if your version of "top" shows that, you'll probably see it skyrocket).

    If you want some sort of approach that actually shares a single copy of the 2GB data set among multiple clients that are being served simultaneously, I think you'll need threads rather than forking. I'm not a reliable source on this, 'cuz I've never used threads myself, but... if I'm not mistaken (no guarantee on that), one of the advantages of threading is that you really can share a single store of in-memory data across threads, whereas you can't do that across children forked from a given parent. I hope others can elaborate from personal experience...

    Meanwhile, you may want to reassess your requirements. How important is it, really, for multiple clients to be serviced in parallel (given that doing so might not be doable without a serious loss of efficiency)? Is there any chance the process could work from a mysql database, rather than from in-memory storage? (Multiple concurrent access to a 2gb dataset is a lot easier to implement efficiently using a real RDBMS, and mysql is pretty zippy for a lot of tasks.)

      What operating system do you use and how did you measure memory usage? I expect anything decent to share all of the pages, marking them Copy-on-Write.

      As far as I understand Perl threads, every new interpreter copies everything not explicitly shared. I'd expect that to do even worse for the poster's question.

        What operating system do you use and how did you measure memory usage?

        macosx 10.3(panther)/darwin 7.5.0; when I said "simple-minded", I meant it:

        perl -e '$|=1; @a=(0..10_000_000); $child = fork(); die "fork failed\n" unless (defined $child); print "parent = $$\nchild = $child\n" if $child; sleep 30'
        and while that was running, do "top" in another window; both processes showed up with the same size.
        I expect anything decent to share all of the pages, marking them Copy-on-Write.
        I guess I'd want to test different cases, with different amounts of data and a more realistic set of operations, to see whether I get what you expect. (I probably won't do that, actually -- it's not the sort of thing I need...)
        As far as I understand Perl threads, every new interpreter copies everything not explicitly shared. I'd expect that to do even worse for the poster's question.
        Thanks for the clarification about threads. I'll grant that my experience with the concept of data sharing across processes is limited. (I'm sure I studied the C functions that create shared memory in Solaris years ago -- and I might even have used them a couple times...) As for threads, I might use them some day, and till then, I guess I should keep my mouth shut about them.

        (update: ...um, if the OP happens to have 2GB organized into a few hefty data structures, and those are explicity shared, why would that be worse than forking? Are the methods for declaring what is shared really unpleasant, or something?)

        RedHat 9 with the latest updates before they stopped updating.
      I didn't think it was relevant, but I am using Net::Patricia for my data storage.
      I thought this might be the case, but the top included with rh9 wasn't showing the children using 2GB of mem.

      The fork manpage says that everyone has a copy of whatever was in the parents mem, but I couldn't prove it.

Re: Handling multiple clients (use threads)
by BrowserUk (Patriarch) on Sep 05, 2004 at 11:50 UTC

    Provided that you create the threads before loading your data, a threaded server works fine. Only the main thread has a copy of the large volume of data, whilst sharing the requests and replies with the server threads through shared memory (Thread::Queue):

    #! perl -slw use strict; use IO::Socket; use threads qw[ yield ]; use threads::shared; use Thread::Queue; $| = 1; our $THREADS ||= 5; my $listening : shared = 0; our $ios = IO::Socket::INET->new( LocalPort => 6969, Type => &IO::Socket::SOCK_STREAM, Proto => 'tcp', Reuse => 1, Listen => 100, ) or die "IO::S->new failed with $!"; print "$ios"; sub server { $listening++; my( $Qquery, $Qreply ) = @_; my $tid = threads->self->tid; print "tid:$tid"; ## Give th other threads a chance to get up and running. yield until $listening == $THREADS; while( my $client = $ios->accept() ) { chomp( my $query = <$client> ); # print "$tid: $client got: '$query'"; $Qquery->enqueue( "$tid:$query" ); my $reply = $Qreply->dequeue(); print $client $reply; close $client; } $listening--; } my @Qs = map{ new Thread::Queue } 0 .. $THREADS; threads->new( \&server, $Qs[ 0 ], $Qs[ $_ ] )->detach for 1 .. $THREAD +S; yield until $listening == $THREADS; print "Threads $listening running; grabbing data"; open BIGFILE, '< :raw', 'data/50MB.dat' or die "data/50mb.dat: $!"; my $data; sysread( BIGFILE, $data, -s( BIGFILE ) ) or die "sysread BIGFILE : $!" +; close BIGFILE; while( $listening ) { my( $tid, $msg ) = split ':', $Qs[ 0 ]->dequeue(); ## Process request print "Received '$msg' from $tid"; $Qs[ $tid ]->enqueue( 'Thankyou for your enquiry' ); }

    Partial server log

    Client:

    #! perl -slw use strict; use IO::Socket; my $socket = IO::Socket::INET->new( PeerAddr => '127.0.0.1', PeerPort => 6969, Proto => "tcp", Type => SOCK_STREAM ) or die "Couldn't connect to 127.0.0.1:6969 : $@"; # ... do something with the socket print $socket "Why don't you call me anymore?"; chomp( my $answer = <$socket> ); print "Got: $answer"; # and terminate the connection when we're done close($socket);

    As is, the server doesn't contain any mechanism for shutting it down, but ^C works okay and a SIGINT handler could deal with cleanup if required.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      Thanks to everyone that replied. I think I misunderstood what exactly the fork was doing in relation to what I wanted to accomplish.

      I am going to give BroswerUK's code a try.

      Thanks again for the quick response.

Re: Handling multiple clients
by lidden (Curate) on Sep 05, 2004 at 01:27 UTC
    Are you handling your dead children? That is catching SIGCHLD. Maybe somthing like:
    $SIG{CHLD} = 'IGNORE';
    will help you. Although you may want to to somthing better then just ignore them.
      Yes, I added the cookbook code to my original post.
Re: Handling multiple clients
by johnnywang (Priest) on Sep 05, 2004 at 07:00 UTC
    I just wrote a little multi-client server at work. It's in a request-response type of situation, i.e., the connections are short lived. Instead of forking, I used threads. One does need to share data explicitly. Some code samples are as follows: (any comments are appreciated, since it wasn't used in a heavy production environment, so I'm not sure how it scales.)
    use strict; use threads; use threads::shared; use IO::Socket; # need to explicitly share variables across threads our $something_to_share; share($something_to_share); my $socket = new IO::Socket::INET( LocalPort=> 11023, Proto => "tcp", Listen => 10, Reuse => 1) or die "Socket could not be created, reason: $!"; while( my $client = $socket->accept() ){ threads->new(\&handler, $client)->detach(); } exit(0); sub handler{ my $client = shift; #read request. my $input = <$client>; # do something, probably access the # the shared variable $something_to_share; # and send response back. print $client "something for you."; }
Re: Handling multiple clients
by kscaldef (Pilgrim) on Sep 05, 2004 at 06:07 UTC

    I've found that the Cookbook code for dealing with SIGCHLD seems to be a bit unreliable (See $? is -1???). As best I could ever figure, despite documentation to the contrary, it appeared that the signal handling didn't actually handle reentrancy correctly.

    However, I think you could get rid of the signal handler and just do something like replace

    # And maintain the population. while (1) { sleep; # wait for a signal (i.e., child's + death) for ($i = $children; $i < $PREFORK; $i++) { make_new_child(); # top up the child pool } }
    with
    # And maintain the population. while ((my $pid = waitpid(-1)) > 0) { $children--; delete $children{$pid}; make_new_child() }
      hi jalewis2,
      I am not much of a perl programmer till now (learning learning...).
      But if i were to do it in C, i would have a server process listeneing to the request. Since the main task seems to be data query it would load the data in the memory in the beginning. Upon recieving a request it will call a function which will return after forking a child ( worker) thread, which will do the work and exit. You can maintain the thread count in the server process. I don't think preforking is a good idea as it will keep consuming resource even if there is no work to do.

        Depending on the type of server you are writing, you may not want the expense of forking for each request. There is certainly a movement in OS design to make forking cheap, but you still shouldn't assume it is without cost. If you typically have short sessions that require high performance and low latency, you probably want to prefork children.

        I'm not sure what resources you are worried about the children consuming. COW implementations of forking mean that the children will use only minimal additional memory unless they have to. If the children are simply blocking on a select call, they won't be using any significant amount of CPU either.

Re: Handling multiple clients
by quai (Novice) on Sep 05, 2004 at 09:14 UTC
    Take a look at "17.13. Non-Forking Servers" in Perl Cookbook from O'Reilly.

    "Problem:You want a server to deal with several simultaneous connections, but you don't want to fork a process to deal with each connection."

Re: Handling multiple clients
by zentara (Cardinal) on Sep 05, 2004 at 13:09 UTC
    I can't find the node offhand, but I seem to remember a variant of this question being asked recently, and the best advice was to load the "huge data" into a ramdisk, so everything can have access to it, without having to worry about "sharing" it across clients, in your code. Or use a database. Just a thought.

    I'm not really a human, but I play one on earth. flash japh
Re: Handling multiple clients
by mkirank (Chaplain) on Sep 07, 2004 at 15:12 UTC
    2 GB of data in memory will slow down your system and also your application , Use DBD::Sqlite2 u will not have problems of installation of any database and can speed up your process also