slloyd has asked for the wisdom of the Perl Monks concerning the following question:

For some reason this code hangs for few seconds when accessing images from urls. I sped up html requests by watching for other signs that I was at the end of the data stream. What am I missing? Is there a way to speed up the socket closure after it is done reading?
#!perl use strict; use Socket; my $url="http://www.cpan.org/misc/jpg/cpan.jpg"; my $host="www.cpan.org"; $|=1; my $start=times; my ( $iaddr, $paddr, $proto ); $iaddr = inet_aton( $host ); $paddr = sockaddr_in( 80, $iaddr ); $proto = getprotobyname( 'tcp' ); unless( socket( SOCK, PF_INET, SOCK_STREAM, $proto ) ) {die "ERROR Dud +e: getUrl socket: $!";} unless( connect( SOCK, $paddr ) ) {die "getUrl connect: $!\n";} my @head=( "GET $url HTTP/1.1", "User-Agent: Mozilla/4.78 [en] (X11; U; Safemode Linux i386)", "Pragma: no-cache", "Host: $host", "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, imag +e/png, */*", "Accept-Language: en" ); push(@head,"",""); #Build Header and print to socket my $header=join("\015\012",@head); print "sending request\n$header\n\n"; select SOCK; $| = 1; binmode SOCK; print SOCK $header; my $body=''; while( <SOCK> ) { my $data=$_; $data=~s/[\r\n\t]+$//s; $data=~s/^[\r\n\t]+//s; last if $data=~/^0$/s; my $len=length($data); #print STDOUT "len:$len\n"; $body .= $data; last if $data=~/\<\/html\>$/is; if($data=~/\<\/body\>$/is){ $body .= qq|</html>|; last; } #print STDOUT "$data\n"; } unless( close( SOCK ) ) { return ( "getUrl close: $!" ); } select STDOUT; close SOCK; my $end=times; my $diff=$end=$start; print "Took $diff to access page\n";

Replies are listed 'Best First'.
Re: using Socket to get urls hangs for several seconds.
by Errto (Vicar) on Feb 15, 2005 at 04:56 UTC
    If this is for a serious application, use LWP. It does these things for you. If the purpose of this code is to teach yourself socket programming, that's a different issue. The code looks basically ok, except that I wouldn't bother putting it through the HTML parsing loop unless you've actually confirmed that the Content-type is 'text/html'. The reason is that you're doing unnecessary work, and you're also splitting on newlines, which is a strange thing to do when reading binary data. If you would like to know how to download a file from HTTP yourself look at the source code for LWP::Protocol and LWP::Protocol::http. This helped me recently in fact with a similar issue.
Re: using Socket to get urls hangs for several seconds.
by merlyn (Sage) on Feb 15, 2005 at 04:54 UTC
    First, why are you not using LWP, especially LWP::Simple?

    Second, in this place:

    "GET $url HTTP/1.1",
    why are you saying "I want protocol 1.1, including keep-alive connections", when in fact you don't want keep-alive connections?

    And guess what. Fixing problem 1 would have eliminated the need to be very smart about problems like problem 2. Do not reinvent the wheel until you've studied prior art!

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

    A reply falls below the community's threshold of quality. You may see it by logging in.