I've been having a pretty awkward problem with a few POE modules. I have a script that does the following:

  1. Reads through a CSV getting URLs.
  2. Generates a SHA1 of the URL - which is the sha1(url) (other minor logic here for directories)
  3. If the sha1(url) exists it posts to the kernel a HEAD request for the url, else it posts a GET request.
  4. In the HEAD response handler: post a GET request to the kernel if the file needs to be re-downloaded.
  5. In the GET response handler if there is data: start to download the file to sha1(url)
  6. If there is no data, simply hard link the sha1(url) to the sha1(file) (this way two files can be hosted on the URL at different times)

There is some other minor logic here, this is just a basic parallel HTTP image downloader. The issue is after a certain point, I get one

Cannot connect to imgs.getauto.com:80 (connect error 110: Connection t +imed out)

And then, each subsequent request returns the same thing. No packets are sent out - as shown with tethereal. I've used 'netstat -atn' to establish that my sockets are opening and closing as they should. They do not get stuck in FIN_WAIT2 (as the other POCO:Client:HTTP bug does).

Here is a dump of the request and response after I get bogged down in this endless loop of nothing:

- &1 !!perl/hash:HTTP::Request _content: '' _headers: !!perl/hash:HTTP::Headers accept: image/* from: evan@dealermade.com host: imgs.getauto.com user-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Ge +cko/2008101315 Ubuntu/8.10 (intrepid) Firefox/3.0.3 _method: GET _protocol: HTTP/1.1 _uri: !!perl/scalar:URI::http http://imgs.getauto.com/imgs/ag/ga/62/ +90/1/WDDNG71X47A036290-1.jpg - !!perl/hash:HTTP::Response _content: | <html> <HEAD><TITLE>Error: Internal Server Error</TITLE></HEAD> <BODY> <H1>Error: Internal Server Error</H1> Cannot connect to imgs.getauto.com:80 (connect error 110: Connecti +on timed out) </BODY> </HTML> _headers: !!perl/hash:HTTP::Headers {} _msg: ~ _rc: 500 _request: *1

According to irc.perl.org's dngor (author of module) that response including the HTTP is forged by the HTTP::Response package -- which actually comes close to making my blood boil.

I've even tried to use the perl debugger. To no avail. I set the NoTTY option and then set signal=1 and the whole thing crashes. The debugger does not seem to be poe friendly. I'm totally at a loss, the versions of the modules I'm using are as follows:

POE::Component::Client::Keepalive v0.23 POE::Component::Client::HTTP v0.86
#!/usr/binenv perl BEGIN{ $DB::signal=0; } use strict; use warnings; use Fcntl; use Digest::SHA1 qw(); use IO::File; use Text::CSV; use File::Spec qw(); use File::Basename qw(); use File::Path qw(); use File::stat; use Memoize; memoize( 'generate_tempname' ); use constant VERBOSE => 1; use feature ':5.10'; use HTTP::Request::Common qw(GET POST HEAD); sub POE::Kernel::ASSERT_DEFAULT () { 1 } use POE qw(Component::Client::HTTP) # Component::Client::Keepalive); #my $pool = POE::Component::Client::Keepalive->new( max_per_host => 4, + timeout => 1800, keep_alive => 180 ); POE::Component::Client::HTTP->spawn( Alias => 'dmua' , Streaming => 4096 # , ConnectionManager => $pool , FollowRedirects => 2 , Agent => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv +:1.9.0.3) Gecko/2008101315 Ubuntu/8.10 (intrepid) Firefox/3.0.3' , From => 'evan@dealermade.com' ); POE::Session->create( inline_states => { _start => \&client_start , _stop => \&client_stop , got_response => \&client_got_response , transfer_complete => \&finalize_transfer } ); $poe_kernel->run(); ### Event handlers begin here. sub client_start { my ($kernel, $heap) = @_[KERNEL, HEAP]; ## $poe_kernel->sig(INT => "_stop"); my $fh = IO::File->new( 'dealermade_pictures.csv', 'r' ); my $header = $fh->getline; my $csv = Text::CSV->new; while ( my $line = $fh->getline ) { $csv->parse( $line ); my ( $picid, $url, $lot, $is_primary ) = $csv->fields; my $temp = generate_tempname( $url ); if ( -e $temp && -f $temp && -s $temp ) { $kernel->post( dmua => request => got_response => HEAD( $u +rl ) ); } else { $kernel->post( dmua => request => got_response => GET( $ur +l, Accept => 'image/*' ) ); } } } sub client_stop { my $heap = $_[HEAP]; } sub client_got_response { my ($heap, $req, $res, $data ) = ( $_[HEAP], $_[ARG0]->[0], @{$_[A +RG1]} ); my $uri = $req->uri; my $temp = generate_tempname( $uri ); given ( $req->method ) { when ( 'HEAD' ) { if ( -e $temp && -f $temp ) { my $stat = stat( $temp ); my $badSize = $res->content_length && $stat->size != $ +res->content_length; my $badDate = $stat->mtime - $res->fresh_until > 0; ## My slow sledge hammer ## use DateTime qw(); ## my $badDate = DateTime->from_epoch( epoch => $stat- +>mtime ) ## ->subtract_datetime( DateTime->from_epoch( epoc +h => $res->fresh_until ) ) ## ->is_positive ## ; if ( VERBOSE ) { if ( $badDate || $badSize ) { say "Posting to the kernel a request to redown +load $uri"; say "\tBAD SIZE detected, our file is ". $stat +->size ." and it should be ". $res->content_length if $badSize ; say "\tBAD DATE detected -- file has since bee +n modified" if $badDate; } else { say "Skipping $uri -- all is current"; } } $poe_kernel->post( dmua => request => got_response => +GET( $uri, Accept => 'image/*' ) ) if $badSize || $badDate ; } else { warn "HEAD requested on non-cached file $temp\n"; } } when ( 'GET' ) { my $this = $_[HEAP]->{uri}{$uri}; my $fh = $this->{fh}; if ( !defined($res->code) || $res->code != '200' ) { say $res->code . " was received from request to $uri"; if ( $res->code == 500 ) { use XXX; YYY [ $req, $res, $_[HEAP]->{connection} +]; $DB::signal=1; } return; } ## If we've never encoutered a response from this request. unless ( $fh ) { if ( VERBOSE ) { say "Started download of $uri : " . $res->code; say "\tDestination temp name:\t$temp"; } ## If the file exists simply unlink it and start over. ## This is here so we can refresh the data behind the +url if ( -e $temp && -f $temp ) { say "\tUnlinking preexiting uri first" if VERBOSE; unlink ( $temp ); } ## Else we might have to create the path to it. else { my $path = File::Basename::dirname( $temp ); unless ( -d $path and -e $path ) { File::Path::mkpath( $path ); say "\tCreating path:\t$path"; } } sysopen ( $fh , $temp , O_WRONLY|O_CREAT ); binmode($fh); ## win32 not required in linux $this = { fh => $fh, temp => $temp, uri => $uri }; $_[HEAP]->{uri}{$uri} = $this; } ## ## If we have data send it to our file handle ## if ( defined $data ) { print $fh $data; } ## ## If we have no more data hard link to store and remove ## else { close $fh; my $stor = generate_storename( $uri ); say "Linking $temp to $stor" if VERBOSE; my $path = File::Basename::dirname( $stor ); File::Path::mkpath( $path ) unless -e $path && -d $pat +h; CORE::link( $temp, $stor ) unless -e $stor ; delete $heap->{uri}{$this->{uri}}; } } } } sub generate_tempname { my $uri = shift; my $sha1 = Digest::SHA1::sha1_hex( $uri ); my ( $f1, $f2, $file ) = unpack ( 'A2A2A*', $sha1 ); $uri =~ /.*([.].*?)$/; my $ext = $1; File::Spec->catfile( qw/out temp/, $f1, $f2, $file . $ext||'.jpg' +); } sub generate_storename { my $uri = shift; my $tempname = generate_tempname($uri); my $io = IO::File->new( $tempname, 'r' ); my $sha1 = Digest::SHA1->new; $sha1->addfile($io); $io->close; my ( $f1, $f2, $file ) = unpack ( 'A2A2A*', $sha1->hexdigest ); $uri =~ /.*([.].*?)$/; my $ext = $1; #File::Spec->catfile( qw/out store/, $sha1->hexdigest . $ext ); File::Spec->catfile( qw/out store/, $f1, $f2, $file . $ext||'.jpg' + ); }
This is what strace will return after a certain point in time, notice it doesn't check sockets or anything complex...
write(1, "---\n- &1 !!perl/hash:HTTP::Reque"..., 808) = 808 write(1, "500 was received from request to"..., 100) = 100 write(1, "---\n- &1 !!perl/hash:HTTP::Reque"..., 808) = 808 write(1, "500 was received from request to"..., 100) = 100 write(1, "---\n- &1 !!perl/hash:HTTP::Reque"..., 808) = 808 write(1, "500 was received from request to"..., 100) = 100 ... forever
Here is the output from POCO::Client::HTTP with the DEBUG and DEBUG_DATA variables set:
T/O: request 149 timed out at /usr/local/lib/perl5/site_perl/5.10.0/P +OE/Component/Client/HTTP.pm line 377. I/O: removing request 149 at /usr/local/lib/perl5/site_perl/5.10.0/PO +E/Component/Client/HTTP.pm line 380. T/O: request 149 has timer 8948 at /usr/local/lib/perl5/site_perl/5.1 +0.0/POE/Component/Client/HTTP.pm line 391. T/O: request 149 is wheel 153 at /usr/local/lib/perl5/site_perl/5.10. +0/POE/Component/Client/HTTP.pm line 397. T/O: request_state = 0x04 I/O: Disconnect, keepalive timeout or HTTP/1.0. at /usr/local/lib/per +l5/site_perl/5.10.0/POE/Component/Client/HTTP.pm line 421.
I don't know enough about what this stuff means, but this is the only suspicious pattern i see repeat itself in strace. My guess: it tries to set a socket with some deep voodoo (failing), and then seek to it (also failing), then it tries to do it all again. Then it assumes it is open, and fails -- and never reconnects.
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfb8b5d8) = -1 EINVAL (Inval +id argument) _llseek(4, 0, 0xbfb8b600, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfb8b5d8) = -1 EINVAL (Inval +id argument) _llseek(4, 0, 0xbfb8b600, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 getpeername(4, 0x1d08a4e8, [256]) = -1 ENOTCONN (Transport endpo +int is not connected)
It should still fail with KeepAlive stuff commented (as is above) it will take longer to get to the fail point though.
DATA FILE The datafile can be found at http://dealermade.com/dealermade_pictures.csv


Evan Carroll
I hack for the ladies.
www.EvanCarroll.com

In reply to POE Component HTTP problem by EvanCarroll

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.