in reply to Handling multiple clients

I wasn't sure myself, so I just did a simple-minded test, and sure enough, when the child starts up, it takes up as much memory as the parent, which means that you're getting a full copy of your 2gb in-memory data each time you fork. Forking 5 children would pretty much guarantee that the OS will need to do a lot of memory swapping to run all those huge child processes. I think the delays you're seeing are not so much the cpu load of the children, but rather the i/o wait imposed by swapping. (Some versions of "top" will report the total percentage of processing time devoted to "i/o wait" -- if your version of "top" shows that, you'll probably see it skyrocket).

If you want some sort of approach that actually shares a single copy of the 2GB data set among multiple clients that are being served simultaneously, I think you'll need threads rather than forking. I'm not a reliable source on this, 'cuz I've never used threads myself, but... if I'm not mistaken (no guarantee on that), one of the advantages of threading is that you really can share a single store of in-memory data across threads, whereas you can't do that across children forked from a given parent. I hope others can elaborate from personal experience...

Meanwhile, you may want to reassess your requirements. How important is it, really, for multiple clients to be serviced in parallel (given that doing so might not be doable without a serious loss of efficiency)? Is there any chance the process could work from a mysql database, rather than from in-memory storage? (Multiple concurrent access to a 2gb dataset is a lot easier to implement efficiently using a real RDBMS, and mysql is pretty zippy for a lot of tasks.)

Replies are listed 'Best First'.
Re^2: Handling multiple clients
by chromatic (Archbishop) on Sep 05, 2004 at 03:54 UTC

    What operating system do you use and how did you measure memory usage? I expect anything decent to share all of the pages, marking them Copy-on-Write.

    As far as I understand Perl threads, every new interpreter copies everything not explicitly shared. I'd expect that to do even worse for the poster's question.

      What operating system do you use and how did you measure memory usage?

      macosx 10.3(panther)/darwin 7.5.0; when I said "simple-minded", I meant it:

      perl -e '$|=1; @a=(0..10_000_000); $child = fork(); die "fork failed\n" unless (defined $child); print "parent = $$\nchild = $child\n" if $child; sleep 30'
      and while that was running, do "top" in another window; both processes showed up with the same size.
      I expect anything decent to share all of the pages, marking them Copy-on-Write.
      I guess I'd want to test different cases, with different amounts of data and a more realistic set of operations, to see whether I get what you expect. (I probably won't do that, actually -- it's not the sort of thing I need...)
      As far as I understand Perl threads, every new interpreter copies everything not explicitly shared. I'd expect that to do even worse for the poster's question.
      Thanks for the clarification about threads. I'll grant that my experience with the concept of data sharing across processes is limited. (I'm sure I studied the C functions that create shared memory in Solaris years ago -- and I might even have used them a couple times...) As for threads, I might use them some day, and till then, I guess I should keep my mouth shut about them.

      (update: ...um, if the OP happens to have 2GB organized into a few hefty data structures, and those are explicity shared, why would that be worse than forking? Are the methods for declaring what is shared really unpleasant, or something?)

        both processes showed up with the same size.

        Yes, but which size? top shows a lot of information. If you don't know what the columns mean, your interpretation can be wildly wrong. Running the following very naive and simple program on my laptop (Linux PPC with a 2.4 kernel) and looking at top shows two processes of about 40 Mb apiece -- but the amount of shared memory for each process is exactly the same.

        #!/usr/bin/perl use strict; use warnings; my @foo; $foo[10000000] = 1; fork and sleep 10; sleep 10;

        As for the question of why Perl's ithreads are worse than forking, having seen the ithreads code only a couple of times and not being an expert on memory management by any means, I suspect Perl doesn't take advantage of the COW features of decent kernels. I make this guess because I don't know of any way that Perl could hint to the kernel to share specific memory pages.

        Fork will share the memory. If One process modifieds a place in memory, it's whole 4K block will get copied and made exclusive to that process. This means that although top will show all your processes using 2Gb Virtual Mem, they in fact share the 2Gb Physical Mem which you are using, provided none of them modifies any of the data. For futher optimisation, you should open+mmap+close the data files instead of doing open+read+close, so that you really use the data files as the backing store for your queries. The OS will then optimize the memory as best as it can.
      RedHat 9 with the latest updates before they stopped updating.
Re^2: Handling multiple clients
by jalewis2 (Monk) on Sep 05, 2004 at 20:51 UTC
    I didn't think it was relevant, but I am using Net::Patricia for my data storage.
Re^2: Handling multiple clients
by jalewis2 (Monk) on Sep 05, 2004 at 20:46 UTC
    I thought this might be the case, but the top included with rh9 wasn't showing the children using 2GB of mem.

    The fork manpage says that everyone has a copy of whatever was in the parents mem, but I couldn't prove it.