in reply to Re: Forking processes / children / threads
in thread Forking processes / children / threads

The machines I would be running this on are all dual processor (sun or intel). But the main thing is the actual overhead of my process (opening a socket to a local http server) is very low but the time for the system to run a 50k line file in serial vs splitting that file in two and running it is double for the one serial process vs 2 of them. I wanted to be able to run this script via a cron and have it do all the things automagically that I am currently doing by hand (splitting file into 8 then running 8 processes of my current script then concatenating it's output.) Unfortunately I can't try 5.8 as it has a few bugs with some of our other code that works fine with 5.6.1

Cheers,

Mark
  • Comment on Re: Re: Forking processes / children / threads

Replies are listed 'Best First'.
Re: Re: Re: Forking processes / children / threads
by dws (Chancellor) on Nov 19, 2002 at 04:58 UTC
    But the main thing is the actual overhead of my process (opening a socket to a local http server) is very low but the time for the system to run a 50k line file in serial vs splitting that file in two and running it is double for the one serial process vs 2 of them.

    Can you characterize the processing that you're doing on lines from this file? If it's compute intensive processing (with no program induced blocking for I/O), then you're not liable to see much improvement from processing sections of the file in parallel.

    Where multiple processes or threads win is where the processing involves activities that block for IO.

      Currently my code is doing ...
      foreach $keyword ( @ARGV ) { $key= join(' ', split(/\+/, $keyword)); $remote = IO::Socket::INET->new( Proto => "tcp", PeerAddr => $host, PeerPort => $port, ); unless ($remote) { die "cannot connect to http daemon on $host + on $port" } $remote->autoflush(1); print $remote "GET $prefix$keyword HTTP/1.0" . $BLANK; $count="0"; while ( <$remote> ) { if (/flist/i) { $count++; } } print "$key,$count\n"; close $remote; }
      And I am redirecting the output using >
      It's a pretty basic bit of code it just gets complicated when it has to automatically run itself 8 times over different parts of the arrary. (I say 8 times because after 8 a performance degradation takes place, 8 seems optimal and can finish doing the process in 40 minutes.)

      Cheers,

      Mark
        If you have any influence over the site you're fetching data from, it would be a lot faster to write a CGI on their server to do the counting for you.

        That said, if all you're doing is counting lines that contain "flist", you might as well do a read() on the socket to get a larger buffer, then grep the buffer for the "flist". Matching in huge files demonstrates a fast technique for grepping through a file without having to pull it all into memory.

Re: Re: Re: Forking processes / children / threads
by pg (Canon) on Nov 19, 2002 at 05:07 UTC
    To split the file, you can try the UNIX split command.