CapitaineCaverne has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have an issue with socket programming, that is I can't read (even with sysread) the end of the data on a socket.
Here is the (very) simple code that shows the issue.
use strict; use IO::Socket; my $host = IO::Socket::INET->new ( PeerAddr=> 'www.smartadserver.com', PeerPort=> '80', Proto => "tcp", Type => SOCK_STREAM, ); print $host <<EOM; GET /call/pubj/445/3197/138/M/5249542624/target? HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, applicati +on/x-shockwave-flash, application/xaml+xml, application/vnd.ms-xpsdoc +ument, application/x-ms-xbap, application/x-ms-application, applicati +on/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, * +/* Accept-Language: fr UA-CPU: x86 Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CL +R 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2) Host: www.smartadserver.com Proxy-Connection: Keep-Alive EOM my $byte; while (sysread($host,$byte,1)==1) { print $byte; }
What I should see, along with the HTTP response is the HTML header, however, here is what I get on the screen.
HTTP/1.1 302 Object moved Date: Tue, 01 Jul 2008 20:08:00 GMT Server: Microsoft-IIS/6.0 P3P: CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo" pragma: no-cache cache-control: private Location: /def/def/showdef.asp Content-Length: 210 Content-Type: text/html Expires: Mon, 30 Jun 2008 20:08:00 GMT Set-Cookie: sasd=%24a%3D78t%3B%24cn%3DFR%5FA8%3B%24isp%3D102%3B%24qt%3 +D184%5F1338%5F12468t; path=/ Set-Cookie: pdomid=5; path=/ Set-Cookie: TestIfCookieP=ok; expires=Fri, 26-Nov-2010 23:00:00 GMT; d +omain=smartadserver.com; path=/ Set-Cookie: TestIfCookie=ok; domain=smartadserver.com; path=/ Set-Cookie: ASPSESSIONIDSCCSCBCA=HGFAJNDBEBFKPPGMPLNOOJBM; path=/ Cache-control: no-cache ****** Suprisingly, HTML DATA is missing here...*****
And tcpdump is indeed showing the HTML code, so it is really being sent ! On top of this, if I do this manually (i.e. open a telnet on port 80 on www.smartadserver.com and paste the same HTTP request that is contained in the perl programm, I get the full response (HTTP answer+HTML code)).
I am really puzzled, particularly because I paid attention (using sysread) to read on a per-byte basis to avoid the necessity of having an EOL at the end of the data.
I've spent already a complete day looking at this and I would appreciate any hints ;-).
Thanks
S.

Replies are listed 'Best First'.
Re: Can't read end data out of a socket
by pc88mxer (Vicar) on Jul 01, 2008 at 20:42 UTC

    Update: this is a simple suffering from buffering problem. Just turn on auto flushing $| = 1 and you'll see the HTML body.

    You got a 302 response which means that the server is redirecting you to another page (see the Location: header line)

    I looked at what wget does, and it seems to just stop reading after getting the header. Just look at the stderr output:

    wget -d http://www.smartadserver.com/call/pubj/445/3197/138/M/52495426 +24/target? 2> /tmp/out
    After reading the header it says
    ---response end--- 302 Object moved Registered socket 3 for persistent reuse.
    So I don't think there is a body (in spite of the Content-Length: 210 header.)
      pc88mxer is right about buffering. You are reading all of the data, but it's not being printed. Perl (and the C STDIO library) will save up output until it has a full page of data, or if it's printing to a terminal a full line. Because the server's response doesn't contain enough data to fill a page or a newline character, Perl won't flush the buffer until STDOUT is closed. And because HTTP 1.1 uses connection keepalives by default, the read loop never exits, and so the program never exits, and so STDOUT is never closed. I also found that adding a:
      Connection: close
      header fixed the problem, since then the server closes the connection, allowing the read loop to exit and then the program.

      FYI, I found this problem using a system call tracer ( strace specifically), which is a fantastic tool for debugging these sorts of things.

      So I don't think there is a body (in spite of the Content-Length: 210 header.)
      It's really there, the server sends a custom 302 document.

      (but I'm sorry, CapitaineCaverne, I don't have a solution other than using LWP)
Re: Can't read end data out of a socket
by moritz (Cardinal) on Jul 01, 2008 at 21:01 UTC
    If you don't want to re-invent the wheel, consider using LWP::UserAgent.
      Using $| indeed solve the problem. Many thanks.
      CC.