Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: Using Perl to run a Windows command-line utility many times with ordered, parallel execution

by Jim (Curate)
on Feb 02, 2014 at 01:28 UTC ( [id://1073011]=note: print w/replies, xml ) Need Help??


in reply to Re: Using Perl to run a Windows command-line utility many times with ordered, parallel execution
in thread Using Perl to run a Windows command-line utility many times with ordered, parallel execution

Thank you, ambrus. This is precisely the example I needed to help me get started.

It wasn't clear enough from my original post that my problem isn't just that I don't understand how to do Windows process control using Perl. My problem is that I don't understand process control well at all. And when I read about it in documentation—not just Perl documentation, any documentation—my head explodes. I struggle with the unfamiliar lingo. If there's a good tutorial for absolute beginners, I haven't found it yet. But with the help of your straightforward Perl code snippet, I was able to make a good start.

So here's the script I cobbled together based on your example. It has extra junk in it that's only there for self-educational purposes. Also, there are actually thousand of lines of DATA (i.e., external commands to be run), not just these few.

use strict; use warnings; use English qw( -no_match_vars ); # For $CHILD_ERROR use POSIX (); my $BATCH_SIZE = 8; my @commands; LINE: while (<DATA>) { next LINE if m/^\s*#/; chomp; my ($txt_file, $tab_file, $total_documents) = split m/,/, $_, 3; my $command = "doit $txt_file > $tab_file"; push @commands, [ $command, $txt_file, $total_documents ]; } while (@commands) { my @pids; my %txt_file_by; for my $cmd (splice @commands, 0, $BATCH_SIZE) { my ($command, $txt_file, $total_documents) = @$cmd; my $pid = system(1, $command); push @pids, $pid; my $timestamp = POSIX::strftime('%H:%M:%S', localtime); print "$timestamp\t$pid\t$command\n"; $txt_file_by{$pid} = $txt_file; } for my $pid (@pids) { $pid == waitpid($pid, 0) or die; die if $CHILD_ERROR; my $timestamp = POSIX::strftime('%H:%M:%S', localtime); print "$timestamp\t$pid\t$txt_file_by{$pid}\n"; } } exit 0; __DATA__ D000349000.txt,D000349000.tab,564530 Z0000042.txt,Z0000042.tab,457277 Z0000013336.txt,Z0000013336.tab,457277 Z0000013426.txt,Z0000013426.tab,382292 D000250000.txt,D000250000.tab,382014 C000004770.txt,C000004770.tab,356580 Z000003462.txt,Z000003462.tab,356580 Z000004770.txt,Z000004770.tab,356580 Z0000012073.txt,Z0000012073.tab,349325 D000303000.txt,D000303000.tab,347852 Z0000013787.txt,Z0000013787.tab,347852 Z0000014288.txt,Z0000014288.tab,289025 D004607000.txt,D004607000.tab,268763 D000245000.txt,D000245000.tab,258363 Z0000012214.txt,Z0000012214.tab,257861 Z0000013342.txt,Z0000013342.tab,257861 Z0000015322.txt,Z0000015322.tab,243612 D000275000.txt,D000275000.tab,242962 D000272000.txt,D000272000.tab,224791 D000271000.txt,D000271000.tab,223537 D000717000.txt,D000717000.tab,216624 Z0000015315.txt,Z0000015315.tab,215390 D004457000.txt,D004457000.tab,211271 Z0000012004.txt,Z0000012004.tab,211271

Until I implemented this, ran it, and watched it closely in action, I couldn't figure out either system() or waitpid(). I don't grok them, but I more-or-less understand what they're accomplishing. It's still unclear to me what the first argument of system(), 1, is for, and I also don't understand what the second argument of waitpid(), 0, is intended to do. An explanation of these mysterious arguments would be helpful.

What are examples of appropriate messages to use with the two calls to die()? I don't fully understand what's being tested and could fail at those points in the script. More generally, how might I flesh out the error handling in the script to make it more robust?

What's the difference between a process and a thread? When and why would I choose to use multiple processes rather than multiple threads and vice versa? I'm running Microsoft Windows, not Unix or Linux. How much does this matter?

If there's an easier or slicker way to compute a timestamp than how I did it here using POSIX::strftime() and localtime(), I'd appreciate a tip.

Thank you again for your help.

Jim

Replies are listed 'Best First'.
Re^3: Using Perl to run a Windows command-line utility many times with ordered, parallel execution
by ambrus (Abbot) on Feb 02, 2014 at 09:10 UTC

    For the first argument of system being 1, please see perldoc perlport.

      [Jim] It's still unclear to me what the first argument of system(), 1, is for…
      [ambrus] For the first argument of system being 1, please see perldoc perlport.

      I know about this fleeting reference; I've read it before. But this is all it says about wait() and waitpid():

      Can only be applied to process handles returned for processes spawned using system(1, ...) or pseudo processes created with fork(). (Win32)

      Huh? It doesn't explain the peculiar and non-orthogonal first argument, 1.

      Thank you again, ambrus.

      Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1073011]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-03-28 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found