MediocreGopher has asked for the wisdom of the Perl Monks concerning the following question:

I'm using the Socket module to read data from http servers. I can connect to them and write requests to them successfully, but receiving the answers to those requests has proven to be challenging. Basically I have sysread read from the socket until it doesn't have anything to read using the following loop:

do {$bytes_read = sysread(DST,$data,1024,length($data));} while ($byte +s_read == 1024);

which I "borrowed" from another thread here on perlmonks. The problem I'm facing (I believe) is that the server isn't dishing out information fast enough; sysread gets info, sticks it onto $data, and comes back to the socket before the server can get more information to the socket. Sysread then finds an empty socket and the loop ends prematurely. Does anyone have any tricks for dealing with this (or am I completely wrong and the problem is my own stupidity)?

Replies are listed 'Best First'.
Re: Getting sysread to read the full packet
by almut (Canon) on May 12, 2010 at 07:25 UTC

    Why are you testing while ($bytes_read == 1024) and not simply while ($bytes_read), which would continue as long as at least one byte has been read...?

    P.S.: are you aware that there are higher level modules such as LWP for doing HTTP, which encapsulate all those nasty little details?

Re: Getting sysread to read the full packet
by ikegami (Patriarch) on May 12, 2010 at 13:32 UTC

    TCP provides you with a stream of bytes. It's your job to convert that into a stream of messages. To do that, you need to define what constitutes a message. If the messages aren't fixed-length, the transmitted data itself must provide information allowing you to isolate each message.

    For example, each message could end with a sentinel value, such as as a newline. If so, you'd use something like the following:

    my $buf = ''; for (;;) { my $rv = sysread($sock, $buf, 64*1024, length($buf)); die $! if !defined($rv); last if !$rv; while ($buf =~ s/^([^\n]*\n)//) { process_msg("$1"); } } die("Partial message") if length($buf);

    Or you could prefix each message with the length of the message. If so, you'd use something like the following:

    my $buf = ''; my $want; for (;;) { my $rv = sysread($sock, $buf, 64*1024, length($buf)); die $! if !defined($rv); last if !$rv; for (;;) { if ($want) { last if length($buf) < $want; process_msg(substr($buf, 0, $want, '')); $want = 0; } else { last if length($buf) < 4; $want = unpack('N', substr($buf, 0, 4, '')); } } } die("Partial message") if $want || length($buf);
Re: Getting sysread to read the full packet
by MediocreGopher (Initiate) on May 12, 2010 at 17:26 UTC

    Why are you testing while ($bytes_read == 1024) and not simply while ($bytes_read), which would continue as long as at least one byte has been read...?

    For some reason while ($bytes_read) never stops, it will just sit there forever. Everyone else seems to be able to use it fine, but I'm having trouble with it.

    And I don't want to use LWP because I won't always necessarily be working with http, it's just what I'm using to test out the script for now

    ikegami: The problem is that I don't necessarily know what protocols my server will be interacting with (could be http, ftp, torrents, anything really). What I'm doing is basically coding a socks5 proxy (of sorts). I have no control over whether the remote host is going to tell me how long the message is, or if it ends with a specific character or not

    I've been looking at Michael Auerswald's perl socks5 implementation, and the relevant part of the code is this:

    if ($client && (vec($eout, fileno($client), 1) || vec($rout, fileno +($client), 1))) { my $result = sysread($client, $tbuffer, 1024); if (!defined($result) || !$result) { return; } } if ($target && (vec($eout, fileno($target), 1) || vec($rout, fil +eno($target), 1))) { my $result = sysread($target, $cbuffer, 1024); if (!defined($result) || !$result) { return; } } while (my $len = length($tbuffer)) { my $res = syswrite($target, $tbuffer, $len); if ($res > 0) { $tbuffer = substr($tbuffer, $res); } else { retur +n; } } while (my $len = length($cbuffer)) { my $res = syswrite($client, $cbuffer, $len); if ($res > 0) { $cbuffer = substr($cbuffer, $res); } else { retur +n; } }
    So he basically reads a set amount, sends it off, then goes back and reads again, and the delay which I need would be created by the sending of the new information to the next machine. This will work for what I need, but is it really all that reliable?