comment on

Nonetheless, it's unlikely that reducing that overhead is going to make a significant dent in your execution time.

I think that would be worth testing. When you're talking about hundreds of thousands of 'mv $a $b' in a shell vs. the equivalent number of 'rename $a, $b' in a perl script, the time (and overall cpu resources) saved by the latter could be well worth the time it takes to write the perl script -- especially if the process is going to be repeated at regular intervals.

I have a handy "shell-loop" tool written in perl (posted here: shloop -- execute shell command on a list) which makes it easy to test this, using the standard "time" utility. I happened to have a set of 23 directories ("20*") holding a total of 3782 files, so I created a second set of 23 empty directories ("to*"), and tried the following:

# rename files from 20* -> to*:

$ find 20* -type f | time shloop -e rename -s ^20:to
       10.62 real         0.39 user         0.27 sys

# now "mv" them back:

$ find to* -type f | time shloop -e mv -s ^to:20
       18.99 real         0.96 user         6.93 sys
[download]

This is on a standard intel desktop box running FreeBSD; I expect the results would be comparable (or more dramatic) on other unix flavors.

The first case uses the perl-internal "rename" function call to relocate each of the 3782 files to a new directory; in the latter case, shloop opens a shell process ( open(SH, "|/bin/sh")) and prints 3782 successive "mv" commands to that shell. An interesting point to make here is that the first case also had the extra overhead of "growing" the new directories as files were added to them for the first time, whereas the target directories for the "mv" run were already big enough -- but the "mv" run still took almost twice as long (probably because of the overhead involved in creating and destroying all those sub-processes).

This is a bit of a "nonsense" example, of course. Presumably, a shell command like "mv foo/* bar/" (or 23 of them, to handle the above example) would be really quick, because lots of files are moved in a single (compiled) process. But I wrote shloop to handle cases where each individual file needed a name-change as well (e.g. rename "foo/*.bar" to "fub/*.bub"). For this sort of case, a pure shell loop approach has do something like o=`echo $i | sed s/foo(.*)bar/fub$1bub/`; mv $i $o on every iteration, which would take much longer than the "mv" example shown above.

So the moral is: don't underestimate the load imposed by standard shell utilities -- they don't actually scale all that well when invoked in large quantities.

In reply to Re^2: BASH vs Perl performance by graff
in thread BASH vs Perl performance by jcoxen

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.