Re^3: Handling multiple clients

What operating system do you use and how did you measure memory usage?

macosx 10.3(panther)/darwin 7.5.0; when I said "simple-minded", I meant it:

perl -e '$|=1; @a=(0..10_000_000);
 $child = fork();
 die "fork failed\n" unless (defined $child);
 print "parent = $$\nchild  = $child\n" if $child;
 sleep 30'
[download]

and while that was running, do "top" in another window; both processes showed up with the same size.

I expect anything decent to share all of the pages, marking them Copy-on-Write.

I guess I'd want to test different cases, with different amounts of data and a more realistic set of operations, to see whether I get what you expect. (I probably won't do that, actually -- it's not the sort of thing I need...)

As far as I understand Perl threads, every new interpreter copies everything not explicitly shared. I'd expect that to do even worse for the poster's question.

Thanks for the clarification about threads. I'll grant that my experience with the concept of data sharing across processes is limited. (I'm sure I studied the C functions that create shared memory in Solaris years ago -- and I might even have used them a couple times...) As for threads, I might use them some day, and till then, I guess I should keep my mouth shut about them.

(update: ...um, if the OP happens to have 2GB organized into a few hefty data structures, and those are explicity shared, why would that be worse than forking? Are the methods for declaring what is shared really unpleasant, or something?)

Comment on Re^3: Handling multiple clients Download Code

Replies are listed 'Best First'.
Re^4: Handling multiple clients by chromatic (Archbishop) on Sep 05, 2004 at 07:11 UTC
both processes showed up with the same size. Yes, but which size? `top` shows a lot of information. If you don't know what the columns mean, your interpretation can be wildly wrong. Running the following very naive and simple program on my laptop (Linux PPC with a 2.4 kernel) and looking at `top` shows two processes of about 40 Mb apiece -- but the amount of shared memory for each process is exactly the same. `#!/usr/bin/perl use strict; use warnings; my @foo; $foo[10000000] = 1; fork and sleep 10; sleep 10;` [download] As for the question of why Perl's ithreads are worse than forking, having seen the ithreads code only a couple of times and not being an expert on memory management by any means, I suspect Perl doesn't take advantage of the COW features of decent kernels. I make this guess because I don't know of any way that Perl could hint to the kernel to share specific memory pages.	[reply] [d/l] [select]
Re^5: Handling multiple clients by BrowserUk (Patriarch) on Sep 05, 2004 at 09:49 UTC
There is always another way to skin a cat. Rather than have every thread share the 2GB of data, I would create the server threads before loading it. Only the the main thread would have a (non-shared) copy of the 2GB, but would share the requests and replies with the server threads. The advantages. No signals, reapers, select or constantly renewing processes. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply]
Re^6: Handling multiple clients by jalewis2 (Monk) on Sep 05, 2004 at 20:48 UTC
This sounds more like what I was after to begin with. Any pointers to code that would achieve that?	[reply]
Re^7: Handling multiple clients by BrowserUk (Patriarch) on Sep 05, 2004 at 21:04 UTC
Re^4: Handling multiple clients by Anonymous Monk on Sep 06, 2004 at 12:48 UTC
Fork will share the memory. If One process modifieds a place in memory, it's whole 4K block will get copied and made exclusive to that process. This means that although top will show all your processes using 2Gb Virtual Mem, they in fact share the 2Gb Physical Mem which you are using, provided none of them modifies any of the data. For futher optimisation, you should open+mmap+close the data files instead of doing open+read+close, so that you really use the data files as the backing store for your queries. The OS will then optimize the memory as best as it can.	[reply]