declan has asked for the wisdom of the Perl Monks concerning the following question:

I have a socket program that I believe is suffering from input buffering (not output buffering).

I can post example code if you want, but it's easy to describe. I connect/accept a socket between two processes and set the socket to not use buffering ($|=1).

One side writes several messages in a row, lets call them mesgA, mesgB, mesgC, sleeps a while then sends mesgD. The other side is in a loop that uses select() to look for incoming messages and processes 1 message each time select says a message is available.

Eg:

$rin = ''; vec($rin,fileno($sd),1) = 1; $nfound = select($rin, undef, undef, 1.0); if ($nfound) { if ($line = <$sd>) { ... } else { ... knows other side closed the socket ... } }
The above code always sees mesgA, but doesn't see mesgB etc until the other side sends mesgD when select() finally realizes more data is on the socket.

The problem isn't the usual output-buffering problem. I know this because if I modify the receiver to ignore what select() says and simply read from $sd, mesgB and mesgC are always there right away.

I believe the problem is input buffering. select() is probably looking quite literally only at the socket, and mesgB/C probably aren't on the socket anymore, they've probably been buffered by the receiver.

So now finally to the questions: how can I deal with that? I need to either 1. turn off the input buffering so mesgB/C will still be on the socket where select() will see them so I can be notified of their existance, or 2. have something like select() that lets me know more data is available to read (even if it's not technically on the socket).

Basically how am I supposed to know how much data is sitting there for me to read? I need to read however much is there (without blocking).

Really I think it should be select()'s job to look at the socket and the input buffer and return true for that socket if data is available.. that's the point of select.. to ask "will a read block or not".

Replies are listed 'Best First'.
Re: select() and input buffering
by ikegami (Patriarch) on Apr 13, 2011 at 22:41 UTC

    I suspect you are suffering from the problem I described in Re^3: Malfunctioning select() call on a FIFO. You need to use sysread. It reads no more than is currently available, and it doesn't buffer data outside of select's field of vision.

    Note: I use IO::Select below since it's a thin layer that's much easier to use.

    use IO::Select qw( ); # I like big blocks and and I can not lie. # Benchmarking might just deny. use constant BLK_SIZE => 64*1024; my $sel = IO::Select->new($fh); my $buf = ''; while ($sel->can_read()) { my $rv = sysread($fh, $buf, BLK_SIZE, length($buf)); if (!defined($rv)) { ... Handle error. Don't forget $buf might not be empty. ... $sel->remove($fh); } if (!$rv) { ... Handle EOF. Don't forget $buf might not be empty. ... $sel->remove($fh); } while ($buf =~ s/^(.*)\n//) { my $msg = $1; my $close = process_msg($msg); $sel->remove($fh) if $close; } }

    Of course, if you have multiple handles, you'll need multiple buffers. A hash keyed by fileno is useful.

    Threads solve this more simply.

      I agree. Once I got the appropriate combination of search terms I eventually found the same thing you described. Essentially that <$sd> is a buffering read and I have to use sysread().

      I've fixed my app now but I'm not real happy with that solution. I had managed to write a fairly large and featureful app over a long weekend due to how simple perl makes so many little coding tasks. But being forced to replace <$sd> with sysread() has bloated my code and probably added a lot more development time.

      It's like chopping the language off at the knees to not have any visibility into those input buffers. From the standpoint of the perl language being as useful as possible I still think the select() should be aware of the input buffers and should have returned true for sockets with buffered input.

      But thanks for your help. I agree with the assessment of the problem and the necessary fix. Thanks.

        Even if select could see the contents of the buffers (which it sometimes does, I believe), you still couldn't use readline (<$fh>).

        I listed two conditions that make sysread suitable. Its non-buffering aspect was just one of them. The other is that it returns as soon as there's data available, so it doesn't block. readline, on the other hand, blocks.

        What makes sysread complicated, the use of buffers, is the result of it's non-blocking nature. Its non-blocking nature is the reason you want to use it. If readline was non-blocking, you'd have to jump through the same hoops.

        select is inherently complicated. Imagine if you had both readers and writers! Again, if you want simplicity, you want threads.