in reply to select() and input buffering

I suspect you are suffering from the problem I described in Re^3: Malfunctioning select() call on a FIFO. You need to use sysread. It reads no more than is currently available, and it doesn't buffer data outside of select's field of vision.

Note: I use IO::Select below since it's a thin layer that's much easier to use.

use IO::Select qw( ); # I like big blocks and and I can not lie. # Benchmarking might just deny. use constant BLK_SIZE => 64*1024; my $sel = IO::Select->new($fh); my $buf = ''; while ($sel->can_read()) { my $rv = sysread($fh, $buf, BLK_SIZE, length($buf)); if (!defined($rv)) { ... Handle error. Don't forget $buf might not be empty. ... $sel->remove($fh); } if (!$rv) { ... Handle EOF. Don't forget $buf might not be empty. ... $sel->remove($fh); } while ($buf =~ s/^(.*)\n//) { my $msg = $1; my $close = process_msg($msg); $sel->remove($fh) if $close; } }

Of course, if you have multiple handles, you'll need multiple buffers. A hash keyed by fileno is useful.

Threads solve this more simply.

Replies are listed 'Best First'.
Re^2: select() and input buffering
by declan (Initiate) on Apr 14, 2011 at 04:09 UTC
    I agree. Once I got the appropriate combination of search terms I eventually found the same thing you described. Essentially that <$sd> is a buffering read and I have to use sysread().

    I've fixed my app now but I'm not real happy with that solution. I had managed to write a fairly large and featureful app over a long weekend due to how simple perl makes so many little coding tasks. But being forced to replace <$sd> with sysread() has bloated my code and probably added a lot more development time.

    It's like chopping the language off at the knees to not have any visibility into those input buffers. From the standpoint of the perl language being as useful as possible I still think the select() should be aware of the input buffers and should have returned true for sockets with buffered input.

    But thanks for your help. I agree with the assessment of the problem and the necessary fix. Thanks.

      Even if select could see the contents of the buffers (which it sometimes does, I believe), you still couldn't use readline (<$fh>).

      I listed two conditions that make sysread suitable. Its non-buffering aspect was just one of them. The other is that it returns as soon as there's data available, so it doesn't block. readline, on the other hand, blocks.

      What makes sysread complicated, the use of buffers, is the result of it's non-blocking nature. Its non-blocking nature is the reason you want to use it. If readline was non-blocking, you'd have to jump through the same hoops.

      select is inherently complicated. Imagine if you had both readers and writers! Again, if you want simplicity, you want threads.

        For this app I'd be okay with that. The logic of my app guarantees enough data to satisfy the readline will be on its way or else the socket will be dying justifying an empty readline. I don't expect select to guarantee me that all the data I could possibly want is available, just to notify me that some data is there. I can agree that sysread() is better though for complete safety in the presense of network hiccups.

        But there's a worse problem I think. Suppose the first thing I do after opening a socket is exchange some info back and forth and then I put the socket in my main loop where select is used. I'd like to use the easy readline coding style for those initial exchanges but unless I'm missing something I can't do that. If at any point in the future of the program my socket will be used with select() then I must not use any buffering read call earlier in the socket's life.

        That's a very harsh restriction. Is there any way around it?