in reply to Re^3: Converting a parallel-serial shell script
in thread Converting a parallel-serial shell script

Finally, you could perhaps shed some light on how what I am talking about is not scalable ... Portable? .... Efficient?

I'm not for one moment going to suggest that a forking solution, where (native) forks are available, couldn't be just as scalable and efficient. Or even moreso. But, until you have implemented such a solution, you will not be aware of how hard it can be to make it so. And you won't truely appreciate the simplicity of threaded version, until you see the complexity of the forked version.

And when you post your solution, we can try running them on *nix and windows and compare them for those 3 criteria.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^4: Converting a parallel-serial shell script

Replies are listed 'Best First'.
Re^5: Converting a parallel-serial shell script
by kubrat (Scribe) on Sep 26, 2008 at 11:11 UTC

    Here is your solution rewritten to use fork instead of threads:

    use strict; use warnings; use Proc::Fork; use IO::Pipe; use IO::Select; my $kids = 4; # How many children we'll create $SIG{CHLD} = 'IGNORE'; my $sel = new IO::Select(); my @payloadlist; for (my $i=0; $i<@ARGV; $i++) { my $idx = $i % $kids; push @{$payloadlist[$idx]}, $ARGV[$i]; } my $start = localtime(); print "Started $start $$\n"; foreach my $payload (@payloadlist) { my $pipe = new IO::Pipe; child { my $p = $pipe->writer; foreach my $filename (@{$payload}) { my $outFile = toTSV($filename); my $now = localtime(); print "$$ $now - Completed $filename\n"; print $p $outFile."\n"; } exit; }; my $r = $pipe->reader; $sel->add($r); } my $workdone = 0 ; while ( $workdone < @payloadlist ) { while(my @ready = $sel->can_read) { foreach my $fh (@ready) { while (<$fh>) { chomp $_; system("echo importing $_\n"); unlink $_; $workdone++; } $sel->remove($fh); $fh->close; } } } my $end = localtime(); print "Completed $end\n"; sub convert{ $_[0]; } sub toTSV { my $filename = shift; my $outFile = $filename . '.tsv'; open my $fhIn, '<', $filename or warn "$filename : $!" and next; open my $fhOut, '>', $outFile or warn "$outFile : $!" and next; while( <$fhIn> ) { my $tsv = convert( $_ ); print $fhOut $tsv; } close $fhOut; close $fhIn; return $outFile; }

    Sure, it is more complicated but it has its advantages too. First, nothing is implicitly shared and second it makes you think on what is the best way to divide the workload. The more equally the workload is divided between the nodes the better.

      First. ++ for actually providing a solution.

      Sure, it is more complicated but it has its advantages too.
      First, nothing is implicitly shared

      One of the major strengths of iThreads (relative to most other forms of threading), is that very little is implicitly shared. And in the solution I posted above, the only sharing (and all associated locking) is entirely transparent to the user code.

      and second it makes you think on what is the best way to divide the workload. The more equally the workload is divided between the nodes the better.

      There is a flaw in your argument here.

      You are dividing the workload based upon the number of files, but that assumes that the files are all roughly the same size. Let's say, for sake of example, that the files come in in groups of four: three small ones and one humungous one. Now three of your processes are going to zip through their set of small files and finish, leaving the fourth process to plod its way through all the humungous ones serially leaving 3 cpus idle. It doesn't scale.

      Sure, that would be unlucky, and if it was always that way then you could take steps to shuffle the pack or otherwise redistribute the files to more evenly distribute the load. In this example, you might query the filesizes and distribute them based on the assumption that processing time will correlate to the size of the files. Now, apart from the added complexity of writing a bin-packing algorithm to do the distribution, it still requires that it is possible to determine the processing time requirement based upon casual inspection of some external criteria. For some types of processes this will work, but for others there may be no causal relationship between filesize and processing time. Again, it doesn't scale.

      It is certainly possible to set up a dynamic queuing system using forks and bi-directional pipes, but this again ramps up the complexity level. And that complexity grows exponentially with the number of cpus. Get it wrong and debugging becomes a nightmare.

      The beauty of the shared queue, is that as each file is completed, the now free CPU grabs the next file. In most cases, the system will self-distribute in a fairly optimal way without operator intervention or pre-knowledge regardless of the number or size of the files. Or the number of CPUs, or even other machine loading. If the processing time is proportional to filesize, you don't need a bin-packing algorithm. You just queue the filenames ordered by filesize descending and the big ones will be processed first with the smaller ones filling in whenever a thread (cpu) becomes free.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        See, it made you think :). Your threaded solution assigns a file to a thread to process, so what if one of the files is much larger than the sum of the rest of the files. The whole execution time will be determined by that one big file.

        Now, where did you wander off thinking of bin packing. It is nice that you posses such knowledge but it seems to be getting in the way a bit. How about treating the whole set of files as a single file and piping line by line to each of the work nodes in turn? Would that not divide the load equally?

        Your suggestion to order the files by sizes when using threads is certainly an improvement that has the advantage of keeping the size and complexity of the code low but it is still sub-optimal.

        The main reason I stick with my approach is because it sort of forces you to think about the design of the solution and I mean this in more general terms. Still, your solution is elegant as it can get for this particular problem.