in reply to Re: Re: Forking processes / children / threads
in thread Forking processes / children / threads

But the main thing is the actual overhead of my process (opening a socket to a local http server) is very low but the time for the system to run a 50k line file in serial vs splitting that file in two and running it is double for the one serial process vs 2 of them.

Can you characterize the processing that you're doing on lines from this file? If it's compute intensive processing (with no program induced blocking for I/O), then you're not liable to see much improvement from processing sections of the file in parallel.

Where multiple processes or threads win is where the processing involves activities that block for IO.

  • Comment on Re: Re: Re: Forking processes / children / threads

Replies are listed 'Best First'.
Re: Re: Re: Re: Forking processes / children / threads
by msergeant (Novice) on Nov 19, 2002 at 05:03 UTC
    Currently my code is doing ...
    foreach $keyword ( @ARGV ) { $key= join(' ', split(/\+/, $keyword)); $remote = IO::Socket::INET->new( Proto => "tcp", PeerAddr => $host, PeerPort => $port, ); unless ($remote) { die "cannot connect to http daemon on $host + on $port" } $remote->autoflush(1); print $remote "GET $prefix$keyword HTTP/1.0" . $BLANK; $count="0"; while ( <$remote> ) { if (/flist/i) { $count++; } } print "$key,$count\n"; close $remote; }
    And I am redirecting the output using >
    It's a pretty basic bit of code it just gets complicated when it has to automatically run itself 8 times over different parts of the arrary. (I say 8 times because after 8 a performance degradation takes place, 8 seems optimal and can finish doing the process in 40 minutes.)

    Cheers,

    Mark
      If you have any influence over the site you're fetching data from, it would be a lot faster to write a CGI on their server to do the counting for you.

      That said, if all you're doing is counting lines that contain "flist", you might as well do a read() on the socket to get a larger buffer, then grep the buffer for the "flist". Matching in huge files demonstrates a fast technique for grepping through a file without having to pull it all into memory.

        I had a look at your code and did a few tests with time on a 20 line file and there wasn't any time difference, the main time waster is actually the request to the HTTP server and getting it's response. Doing the same sort of thing on a file locally I get fantastic response times. Unfortunately the only way I can get the data I need is from the web server, this is why I have been running the process 8 times due to the fact that 8 requests from a local web server is much the same as 1. As such I am trying to eliminate the slow poing of the HTTP get request by processing multiple "GETS" at a time.

        Cheers,

        Mark