EvanCarroll has asked for the wisdom of the Perl Monks concerning the following question:
I've been having a pretty awkward problem with a few POE modules. I have a script that does the following:
There is some other minor logic here, this is just a basic parallel HTTP image downloader. The issue is after a certain point, I get one
Cannot connect to imgs.getauto.com:80 (connect error 110: Connection t +imed out)
And then, each subsequent request returns the same thing. No packets are sent out - as shown with tethereal. I've used 'netstat -atn' to establish that my sockets are opening and closing as they should. They do not get stuck in FIN_WAIT2 (as the other POCO:Client:HTTP bug does).
Here is a dump of the request and response after I get bogged down in this endless loop of nothing:
- &1 !!perl/hash:HTTP::Request _content: '' _headers: !!perl/hash:HTTP::Headers accept: image/* from: evan@dealermade.com host: imgs.getauto.com user-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Ge +cko/2008101315 Ubuntu/8.10 (intrepid) Firefox/3.0.3 _method: GET _protocol: HTTP/1.1 _uri: !!perl/scalar:URI::http http://imgs.getauto.com/imgs/ag/ga/62/ +90/1/WDDNG71X47A036290-1.jpg - !!perl/hash:HTTP::Response _content: | <html> <HEAD><TITLE>Error: Internal Server Error</TITLE></HEAD> <BODY> <H1>Error: Internal Server Error</H1> Cannot connect to imgs.getauto.com:80 (connect error 110: Connecti +on timed out) </BODY> </HTML> _headers: !!perl/hash:HTTP::Headers {} _msg: ~ _rc: 500 _request: *1
According to irc.perl.org's dngor (author of module) that response including the HTTP is forged by the HTTP::Response package -- which actually comes close to making my blood boil.
I've even tried to use the perl debugger. To no avail. I set the NoTTY option and then set signal=1 and the whole thing crashes. The debugger does not seem to be poe friendly. I'm totally at a loss, the versions of the modules I'm using are as follows:
POE::Component::Client::Keepalive v0.23 POE::Component::Client::HTTP v0.86
This is what strace will return after a certain point in time, notice it doesn't check sockets or anything complex...#!/usr/binenv perl BEGIN{ $DB::signal=0; } use strict; use warnings; use Fcntl; use Digest::SHA1 qw(); use IO::File; use Text::CSV; use File::Spec qw(); use File::Basename qw(); use File::Path qw(); use File::stat; use Memoize; memoize( 'generate_tempname' ); use constant VERBOSE => 1; use feature ':5.10'; use HTTP::Request::Common qw(GET POST HEAD); sub POE::Kernel::ASSERT_DEFAULT () { 1 } use POE qw(Component::Client::HTTP) # Component::Client::Keepalive); #my $pool = POE::Component::Client::Keepalive->new( max_per_host => 4, + timeout => 1800, keep_alive => 180 ); POE::Component::Client::HTTP->spawn( Alias => 'dmua' , Streaming => 4096 # , ConnectionManager => $pool , FollowRedirects => 2 , Agent => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv +:1.9.0.3) Gecko/2008101315 Ubuntu/8.10 (intrepid) Firefox/3.0.3' , From => 'evan@dealermade.com' ); POE::Session->create( inline_states => { _start => \&client_start , _stop => \&client_stop , got_response => \&client_got_response , transfer_complete => \&finalize_transfer } ); $poe_kernel->run(); ### Event handlers begin here. sub client_start { my ($kernel, $heap) = @_[KERNEL, HEAP]; ## $poe_kernel->sig(INT => "_stop"); my $fh = IO::File->new( 'dealermade_pictures.csv', 'r' ); my $header = $fh->getline; my $csv = Text::CSV->new; while ( my $line = $fh->getline ) { $csv->parse( $line ); my ( $picid, $url, $lot, $is_primary ) = $csv->fields; my $temp = generate_tempname( $url ); if ( -e $temp && -f $temp && -s $temp ) { $kernel->post( dmua => request => got_response => HEAD( $u +rl ) ); } else { $kernel->post( dmua => request => got_response => GET( $ur +l, Accept => 'image/*' ) ); } } } sub client_stop { my $heap = $_[HEAP]; } sub client_got_response { my ($heap, $req, $res, $data ) = ( $_[HEAP], $_[ARG0]->[0], @{$_[A +RG1]} ); my $uri = $req->uri; my $temp = generate_tempname( $uri ); given ( $req->method ) { when ( 'HEAD' ) { if ( -e $temp && -f $temp ) { my $stat = stat( $temp ); my $badSize = $res->content_length && $stat->size != $ +res->content_length; my $badDate = $stat->mtime - $res->fresh_until > 0; ## My slow sledge hammer ## use DateTime qw(); ## my $badDate = DateTime->from_epoch( epoch => $stat- +>mtime ) ## ->subtract_datetime( DateTime->from_epoch( epoc +h => $res->fresh_until ) ) ## ->is_positive ## ; if ( VERBOSE ) { if ( $badDate || $badSize ) { say "Posting to the kernel a request to redown +load $uri"; say "\tBAD SIZE detected, our file is ". $stat +->size ." and it should be ". $res->content_length if $badSize ; say "\tBAD DATE detected -- file has since bee +n modified" if $badDate; } else { say "Skipping $uri -- all is current"; } } $poe_kernel->post( dmua => request => got_response => +GET( $uri, Accept => 'image/*' ) ) if $badSize || $badDate ; } else { warn "HEAD requested on non-cached file $temp\n"; } } when ( 'GET' ) { my $this = $_[HEAP]->{uri}{$uri}; my $fh = $this->{fh}; if ( !defined($res->code) || $res->code != '200' ) { say $res->code . " was received from request to $uri"; if ( $res->code == 500 ) { use XXX; YYY [ $req, $res, $_[HEAP]->{connection} +]; $DB::signal=1; } return; } ## If we've never encoutered a response from this request. unless ( $fh ) { if ( VERBOSE ) { say "Started download of $uri : " . $res->code; say "\tDestination temp name:\t$temp"; } ## If the file exists simply unlink it and start over. ## This is here so we can refresh the data behind the +url if ( -e $temp && -f $temp ) { say "\tUnlinking preexiting uri first" if VERBOSE; unlink ( $temp ); } ## Else we might have to create the path to it. else { my $path = File::Basename::dirname( $temp ); unless ( -d $path and -e $path ) { File::Path::mkpath( $path ); say "\tCreating path:\t$path"; } } sysopen ( $fh , $temp , O_WRONLY|O_CREAT ); binmode($fh); ## win32 not required in linux $this = { fh => $fh, temp => $temp, uri => $uri }; $_[HEAP]->{uri}{$uri} = $this; } ## ## If we have data send it to our file handle ## if ( defined $data ) { print $fh $data; } ## ## If we have no more data hard link to store and remove ## else { close $fh; my $stor = generate_storename( $uri ); say "Linking $temp to $stor" if VERBOSE; my $path = File::Basename::dirname( $stor ); File::Path::mkpath( $path ) unless -e $path && -d $pat +h; CORE::link( $temp, $stor ) unless -e $stor ; delete $heap->{uri}{$this->{uri}}; } } } } sub generate_tempname { my $uri = shift; my $sha1 = Digest::SHA1::sha1_hex( $uri ); my ( $f1, $f2, $file ) = unpack ( 'A2A2A*', $sha1 ); $uri =~ /.*([.].*?)$/; my $ext = $1; File::Spec->catfile( qw/out temp/, $f1, $f2, $file . $ext||'.jpg' +); } sub generate_storename { my $uri = shift; my $tempname = generate_tempname($uri); my $io = IO::File->new( $tempname, 'r' ); my $sha1 = Digest::SHA1->new; $sha1->addfile($io); $io->close; my ( $f1, $f2, $file ) = unpack ( 'A2A2A*', $sha1->hexdigest ); $uri =~ /.*([.].*?)$/; my $ext = $1; #File::Spec->catfile( qw/out store/, $sha1->hexdigest . $ext ); File::Spec->catfile( qw/out store/, $f1, $f2, $file . $ext||'.jpg' + ); }
Here is the output from POCO::Client::HTTP with the DEBUG and DEBUG_DATA variables set:write(1, "---\n- &1 !!perl/hash:HTTP::Reque"..., 808) = 808 write(1, "500 was received from request to"..., 100) = 100 write(1, "---\n- &1 !!perl/hash:HTTP::Reque"..., 808) = 808 write(1, "500 was received from request to"..., 100) = 100 write(1, "---\n- &1 !!perl/hash:HTTP::Reque"..., 808) = 808 write(1, "500 was received from request to"..., 100) = 100 ... forever
I don't know enough about what this stuff means, but this is the only suspicious pattern i see repeat itself in strace. My guess: it tries to set a socket with some deep voodoo (failing), and then seek to it (also failing), then it tries to do it all again. Then it assumes it is open, and fails -- and never reconnects.T/O: request 149 timed out at /usr/local/lib/perl5/site_perl/5.10.0/P +OE/Component/Client/HTTP.pm line 377. I/O: removing request 149 at /usr/local/lib/perl5/site_perl/5.10.0/PO +E/Component/Client/HTTP.pm line 380. T/O: request 149 has timer 8948 at /usr/local/lib/perl5/site_perl/5.1 +0.0/POE/Component/Client/HTTP.pm line 391. T/O: request 149 is wheel 153 at /usr/local/lib/perl5/site_perl/5.10. +0/POE/Component/Client/HTTP.pm line 397. T/O: request_state = 0x04 I/O: Disconnect, keepalive timeout or HTTP/1.0. at /usr/local/lib/per +l5/site_perl/5.10.0/POE/Component/Client/HTTP.pm line 421.
It should still fail with KeepAlive stuff commented (as is above) it will take longer to get to the fail point though.socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfb8b5d8) = -1 EINVAL (Inval +id argument) _llseek(4, 0, 0xbfb8b600, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfb8b5d8) = -1 EINVAL (Inval +id argument) _llseek(4, 0, 0xbfb8b600, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 getpeername(4, 0x1d08a4e8, [256]) = -1 ENOTCONN (Transport endpo +int is not connected)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: POE Component HTTP problem
by bingos (Vicar) on Nov 28, 2008 at 09:36 UTC | |
|
Re: POE Component HTTP problem
by waba (Monk) on Nov 27, 2008 at 21:30 UTC | |
|
Re: POE Component HTTP problem
by EvanCarroll (Chaplain) on Nov 30, 2008 at 01:54 UTC |