comment on

Being a relative newcomer to unix, forking, threading, and parallelization versus serialization has been kind of a murky black art to me. Only recently have I felt like I have begun to get a handle on things.

Anyway, I wrote a script to demonstrate the difference between running a series of five simple two-second commands "serially" (one after the other) versus running them in parallel. Serially, it takes ten seconds. Parallelly, it takes two. So obviously, this forking stuff is something that can save time if mastered :)

To understand this script, it's important to know that in bash, you can fork a process by using the & character. There may be other ways to fork a process, but that's the way I use. Also, processes in () run in their own process space. So if you have two commands, cmd1 and cmd2, and you want to run them in serial you can do this with

( (cmd1 ) &); ( (c cmd2 ) &);

This general recipe can be applied to as many commands as you want, and the commands inside the parenthesis can be arbitrarily complex. Since this transformation seemed like the kind of thing I might want to do more than once, I wrote a function to do this: parallelize_em in the script below. The bash command then gets run from the perl script using backticks. Simple :)

I am curious what the other monks think of this, and how they deal with this issue of forking to get speedup. I am sure there are modules on CPAN that accomplish this same kind of thing, and I am curious what is being used out there.

Anyway, I hope this simple little demo of forking and paralellization helps some beginners out there in perl land. And maybe in the responses I will learn of ways to accomplish this that are better than what I proposed.

Long live perl!

The Output:

$./parallelize_em_demo.pl
touch a; ls -l a; sleep 2
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 a
touch b; ls -l b; sleep 2
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 b
touch c; ls -l c; sleep 2
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 c
touch d; ls -l d; sleep 2
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 d
touch e; ls -l e; sleep 2
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 e

time elapsed serial: 10

( ( touch a; ls -l a; sleep 2 ) & );( ( touch b; ls -l b; sleep 2 ) & 
+);( ( touch c; ls -l c; sleep 2 ) & );( ( touch d; ls -l d; sleep 2 )
+ & );( ( touch e; ls -l e; sleep 2 ) & );
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 a
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 b
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 c
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 e
-rw-r--r--  1 hartmann users 0 2006-06-09 17:58 d

time elapsed parallel: 2

$
[download]

The script:

hartmann@ds0050:~/learning/forkArena> cat parallelize_em_demo.pl
#!/usr/bin/perl
use strict;
use warnings;
use Carp qw(confess);

my @commands = map {
  "touch $_; ls -l $_; sleep 2";
} qw(a b c d e);

my ( $time_start, $time_elapsed_serial, $time_elapsed_parallel);

$time_start=time();
#does first, waits two seconds, does the second, waits two seconds, et
+c. (should take about ten seconds)
for my $bash_command ( @commands  ) {
  run_bash_command($bash_command);
}
$time_elapsed_serial = time()-$time_start;
print "\ntime elapsed serial: $time_elapsed_serial\n\n";

#does commands in parallel. (should take about two seconds)
my $parallel_running_command = parallelize_bash_commands([@commands]);

$time_start = time();
run_bash_command($parallel_running_command);
$time_elapsed_parallel = time()-$time_start;
print "\ntime elapsed parallel: $time_elapsed_parallel\n\n";

sub run_bash_command {
  my $command = shift or die "no command";
  print "$command\n";
  print `$command`;
}

sub parallelize_bash_commands {
  my $commands = shift or confess "no commands";
  ref($commands) eq 'ARRAY' or confess "not an array";

  my $parallel_running_command = "";
  for my $command ( @$commands  ) {
    $parallel_running_command .= "( ( $command ) & );";
  }
  return $parallel_running_command;
}
[download]

In reply to Using perl to speed up a series of bash commands by transforming them into a single command that will run everything in parallel. by tphyahoo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.