DBX has asked for the wisdom of the Perl Monks concerning the following question:

I wrote a system which uses Storable::freeze() to serialize a data structure. It then uses IPC::Open2 and IO::Handle to spawn multiple child processes and write to them, reading back a status from each one.

When this runs, the children are causing huge CPU load (according to top), even when nothing is being read. I can't figure out why, probably due to my inexperience with IPC. I have RTFM'ed just about every IPC document I could get my hands on. There are multiple parents running at any given time and each parent will spawn any number of children. Right now, I'm experimenting with 10 children per parent. Some code for your review is below.

My question is: Is there something I'm missing in this code that would cause the CPUs to churn even when simply selecting and reading? Or is there something else I'm just not seeing? Snippets from the parent:
my ($readHandle, $writeHandle) = (IO::Handle->new, IO::Handle->new +); my $pid = open2($readHandle, $writeHandle, @command); $readHandle->autoflush(1); $readHandle->blocking(0); $writeHandle->autoflush(1); $writeHandle->blocking(0);
and later in the code...
## $messageForProcessing is a data structure stored as a hash refe +rence my $frozenMessage = freeze($messageForProcessing); print $writeHandle $frozenMessage . qq|\n| . $this->_messageDelimi +ter();
The children that are spawned have a method to read the data from the parent and process it (modified a bit to remove business logic):
sub processMessages { my ($this) = @_; my $endTime = time + $ENV{'MAX_LIFETIME'}; select STDOUT; $|++; # make unbuffered my $select = IO::Select->new(); $select->add(\*STDIN); my $messageDelimiter = $this->_messageDelimiter(); my $delimiterMatch = qr{\n$messageDelimiter$}; my $frozenMessage; my $messageToProcess = {}; my $stopNow = 0; while(time < $endTime) { my $offset; my ($handle) = $select->can_read(5); my $previousRead; while( ($handle) && (my $bytes = $handle->sysread($frozenMessage, 8192, $offse +t)) ) { $offset += $bytes; ## Because Storable may put newlines in the frozen object, + ## delimit messages: my $searchableText = $previousRead . $frozenMessage; if($searchableText =~ /$delimiterMatch/) { $frozenMessage =~ s/$delimiterMatch//; $messageToProcess = thaw($frozenMessage); } $previousRead = $frozenMessage; if(%$messageToProcess) { ### DO MESSAGE PROCESSING HERE ## my $outputMessage; if($processor->hasErrors()) { my $errors = $processor->getErrors(); $outputMessage = qq|$$ ERROR $errors->[0]\n|; } else { my $messageID = $processor->lastMessageID(); $outputMessage = qq|$$ SUCCESS $messageID\n|; } ## Write status back to the parent print STDOUT $outputMessage; $offset = 0; undef($frozenMessage); undef($previousRead); undef(%$messageToProcess); } } } }
At any given time, at least 5 children are running using 10% or more of CPU, up to 95%. strace on their processes shows that this might be the case even when they are not reading any data:
Process 22515 attached - interrupt to quit select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0 select(8, [0], NULL, NULL, {5, 0}) = 1 (in [0], left {5, 0}) read(0, "", 8192) = 0

Replies are listed 'Best First'.
Re: Performance and CPU load: sysread, IO::Select and/or Storable::thaw
by jethro (Monsignor) on Jun 28, 2010 at 16:17 UTC
    It seems you are turning off blocking on the filehandles. I might be wrong, but if the handles don't block then "$select->can_read(5);" should not wait 5 seconds but return instantly (because as I understand it, can_read (via select()) tries a read on a filehandle and blocks until it can read or a timer runs out)

    You might check that easily by putting a "print ++$x;" statement near the can_read line and see if there is a 5 second wait inbetween the numbers. Then turn on blocking reads and see if anything changes

    UPDATE: Read the select() man page (http://linux.die.net/man/2/select), it says there that select returns when you can read a filehandle without blocking

      Your reply makes perfect sense. The parent runs for quite a long time and starts only under very specific circumstances. It's also working with some legacy logic, so it's hard to test. I have, however, modified the child to remove the 5 second time out from can_read(5) because the IO::Select docs say:

      If "TIMEOUT" is not given and any handles are registered then the call will block.

      I also added the line:

      my ($handle) = $select->can_read(5); $handle->blocking(1);

      This had no effect either, presumably because the handle to \*STDIN had already been created. I also tried setting that handle to blocking(1) before any loops start, which also had no effect:

      my @handles = $select->handles(); $handles[0]->blocking(1);

      I'm concerned that even if I set the read handle to block in the parent when open2() is called it still won't work. Perhaps I need both the write handle and read handle to block?

      Forgive my IPC ignorance here, but what I'm trying does not seem to match what I've read and I tried to do an extremely thorough job of research and testing.

        I would have uncommented any calls to $handle->blocking(0). The default mode is blocking, so that should work, and work even after forks.

        Naturally that could make other problems if some other part of the code relies on non-blocking reads, but it would be only for testing the hypothesis

        The timeout of 5 seconds is quite ok. If the loop only runs every 5 seconds you will never notice it. What the docs mean is that if you don't specify a TIMEOUT value, the timeout value will be infinity. But 5 seconds is practically infinity, if you look at cpu load

        So uncomment or remove the blocking(0) calls, add print lines around your can_read calls to check if they wait for 5 seconds. If they still don't wait, check how many bytes they read out of the socket, maybe whatever code that writes to the socket is broken and spams the socket with data.