in reply to Re^4: sysread and syswrite in tcp sockets
in thread sysread and syswrite in tcp sockets

Hi Thanks for the feedback. The reason why I have to use this is because:

first I don't know what is the length of data coming. It is of variable size

second, the client is having a persistent connection to the server, which means data keeps on coming in in interval and it is of binary format. As far as I know, sysread will block if no data is coming. Thus, there is no way I could receive 0 to indicate end of file.

I understand that the timeout will backfire due to variable circumstance of network connection . Is there a reliable way which I can defintely receive the data as a whole with the condition I'm having above

Thanks!
  • Comment on Re^5: sysread and syswrite in tcp sockets

Replies are listed 'Best First'.
Re^6: sysread and syswrite in tcp sockets
by gone2015 (Deacon) on Dec 28, 2008 at 01:01 UTC

    If you are confident that the network and the client are reliable, then the problem is more straightforward.

    When the client opens a connection, sends as all the data it's going to send in short order, and then closes the connection, the server will hear the close and will know that is the end of the data. sysread will return zero, indicating "eof" -- it will not block if everything has been read and the far end has closed their end of the connection. You don't need the timeout (whose purpose would be to detect some failure such that packets stop arriving).

    If things really are this simple, it would be easier to use a single read (assuming you know that the maximum amount data to be received) which will return once it has read everything sent and the far end has closed.

      Hi, sorry for the late reply..

      the connection will be persistent, i.e. either side will never close off the connection (unless got network error). The thing is if i implement the same kind of code in c#, it doesn't seem to have this problem. But in perl, as sysread does partial read, thus this problem surfaces..

      so is there a more reliable solution as the proposed one above?

        The typical solution is for the sender to prefix the data with the number of bytes. Then the reader can read (say) a 4 byte integer indicating how many bytes will follow, and then issue a read for the approriate number of bytes, or better, loop reading reasonable size chunks and counting until it has it all:

        ## sender my $size = -s 'theFileToSend'; print $socket pack 'N', $size; ## Assumes files less than 4GB. local $/ = \2**16; ## read the file in 64k chunks print $socket $_ while <$file>; ## The reader my $size; read( $sock, $size, 4 ); ## Get the size in binary $size = unpack 'N', $size; while( $size > 0 ) { my $buffer; my $chunkSize = $size < 2**16 ? $size : 2**16; $size -= read( $sock, $buffer, $chunkSize ); ## do something with this chunk }

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        OK. When you say persistent I assume that means that a number of separate chunks of data are going to be sent. What you need is a way to signal to the client the end of each chunk of data.

        As indicated elsewhere, one way is to prefix each chunk with its length -- and do two simple reads per chunk, one for the length and one for the actual chunk -- ie an in-band signal of the end of each chunk. (If the sender closed the connection after each chunk, that would be an effective out-of-band signal.) (For completeness: if you cannot tell how long each chunk is before starting to send, you could break it into sub-chunks, preceed each one by its length and send an empty sub-chunk at the end.)

        However, as this connection will persist for some time, then the possibility of getting a network error (or a machine dieing or being turned off or some other disturbance) increases. So I would go back to the (slightly) more involved sysread and deal with the possibility of errors.

        Now you have two other possibilities:

        1. in-band: if the data allows it, send an "end-of-chunk" marker between each chunk. Now the reader needs to check after each sysread for the marker. Noting that the marker may not be at the end of the result of each sysread, though it probably will be.

        2. out-of-band: time -- ensure that the sender will send each chunk in a rapid sequence of packets (so no more than 's' seconds between packets), and then wait at least 't' seconds before sending the next chunk. Now, once a chunk has started, the reader concludes that a chunk is complete if it has to wait more than 'r' seconds for further data. To cope with variable delays introduced by the system, you'll want the time 's' to be significantly less than 'r', which in turn is significantly less than 't' -- where what is significant depends on your system (network and client and server) and the application (how long 't' is and how long you can make 'r' without violating any latency requirement).

        You say you've achieved this using C# quite readily. I guess it is implicitly doing (2) above, where 's' is small -- perhaps because each chunk is not very big and is sent in a single Send and the system is lightly loaded. So 'r' can be quite small. (And 't' is implicitly bigger than 'r'.)

        Having said all that, using time as the end-of-chunk signal can fail. If the system becomes busy during the transmission of a chunk, delaying a packet for 'r' seconds or more, the receiver will be fooled into thinking that the chunk is complete, and will treat the rest of the chunk as the start of another -- giving you two broken chunks. If the sender (or the network between receiver and sender) fails part way through a chunk the receiver will also be fooled. So... we're in the realm of what is reliable enough... if and 's' is very small and 'r' is small, and the system is generally reliable, and the chunks are short (so failures and delays are unlikely to interrupt a chunk in progress), then this may be good enough. But, unless you have some other way of detecting broken chunks, there remains the danger of a "silent" error.

        You ask for "reliable". If you want 100% reliable, you need any one of the in-band end-of-chunk signals, or some other way of detecting broken chunks. Otherwise, subject to all the caveats above, you can use time as the signal -- having verified for yourself, on your system, that it is reliable enough for your needs.