in reply to Re: Re: Re: Forking processes / children / threads
in thread Forking processes / children / threads

Currently my code is doing ...
foreach $keyword ( @ARGV ) { $key= join(' ', split(/\+/, $keyword)); $remote = IO::Socket::INET->new( Proto => "tcp", PeerAddr => $host, PeerPort => $port, ); unless ($remote) { die "cannot connect to http daemon on $host + on $port" } $remote->autoflush(1); print $remote "GET $prefix$keyword HTTP/1.0" . $BLANK; $count="0"; while ( <$remote> ) { if (/flist/i) { $count++; } } print "$key,$count\n"; close $remote; }
And I am redirecting the output using >
It's a pretty basic bit of code it just gets complicated when it has to automatically run itself 8 times over different parts of the arrary. (I say 8 times because after 8 a performance degradation takes place, 8 seems optimal and can finish doing the process in 40 minutes.)

Cheers,

Mark

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Forking processes / children / threads
by dws (Chancellor) on Nov 19, 2002 at 05:24 UTC
    If you have any influence over the site you're fetching data from, it would be a lot faster to write a CGI on their server to do the counting for you.

    That said, if all you're doing is counting lines that contain "flist", you might as well do a read() on the socket to get a larger buffer, then grep the buffer for the "flist". Matching in huge files demonstrates a fast technique for grepping through a file without having to pull it all into memory.

      I had a look at your code and did a few tests with time on a 20 line file and there wasn't any time difference, the main time waster is actually the request to the HTTP server and getting it's response. Doing the same sort of thing on a file locally I get fantastic response times. Unfortunately the only way I can get the data I need is from the web server, this is why I have been running the process 8 times due to the fact that 8 requests from a local web server is much the same as 1. As such I am trying to eliminate the slow poing of the HTTP get request by processing multiple "GETS" at a time.

      Cheers,

      Mark