Re^3: looping efficiency (Benchmark Example)
by eyepopslikeamosquito (Archbishop) on Dec 30, 2020 at 02:48 UTC
|
Don’t Optimize Code -- Benchmark It
-- from Ten Essential Development Practices by Damian Conway
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times;
premature optimization is the root of all evil (or at least most of it) in programming
-- Donald Knuth
Without good design, good algorithms, and complete understanding of the
program's operation, your carefully optimized code will amount to one of
mankind's least fruitful creations -- a fast slow program.
-- Michael Abrash
Wow, I assumed they were complaining about clarity rather than efficiency! :)
For cheap thrills, assuming all you need are the file names 0000 .. 9999 in order,
I benchmarked the three offered solutions as shown below.
use strict;
use warnings;
use Benchmark qw(timethese);
sub orig {
for my $i ( 0 .. 9 ) {
for my $j ( 0 .. 9 ) {
for my $k ( 0 .. 9 ) {
for my $l ( 0 .. 9 ) {
my $filename = "$i$j$k$l";
}
}
}
}
}
sub yourmum {
for ( 0 .. 9999 ) {
my $filename = sprintf "%04d", $_;
}
}
sub marshall {
for ('0000' ... '9999') {
my $filename = $_;
}
}
orig();
yourmum();
marshall();
timethese 50000, {
Orig => sub { orig() },
YourMum => sub { yourmum() },
Marshall => sub { marshall() },
};
On my machine, the results were:
Benchmark: timing 50000 iterations of Marshall, Orig, YourMum...
Marshall: 25 wallclock secs (25.16 usr + 0.00 sys = 25.16 CPU) @ 19
+87.52/s (n=50000)
Orig: 39 wallclock secs (38.83 usr + 0.00 sys = 38.83 CPU) @ 12
+87.73/s (n=50000)
YourMum: 40 wallclock secs (40.08 usr + 0.00 sys = 40.08 CPU) @ 12
+47.57/s (n=50000)
If all you need are the file names (not the individual digits), it's no surprise that
Marshall's suggestion was the fastest. I also think it is the simplest and clearest
if all you need are the file names.
Update: see also perlperf - Perl Performance and Optimization Techniques. Added this node to Performance References.
| [reply] [d/l] [select] |
|
| [reply] |
|
Oh I totally agree! I was shocked they would argue about efficiency without first benchmarking the whole
program and identifying that code as a chronic bottleneck!
(I only did the benchmark because I was bored and curious on a slow Christmas/New Year holiday).
On Coding Standards and Code Reviews summarises my opinion:
- Correctness, simplicity and clarity come first. Avoid unnecessary cleverness. If you must rely on cleverness, encapsulate and comment it.
- Don't optimize prematurely. Benchmark before you optimize. Comment why you are optimizing.
For much more detail on this topic see:
| [reply] |
|
perl -d NYTProf yourscript.pl foo bar baz
... followed by ...
nytprofhtml --open
... and look for the code that takes the longest time to run. That's where you want to start optimizing.
Commit your current version to git/SVN/CVS/whatever, modify, run through NYTProf again until speed improves. Revert and retry if speed goes down. Commit, re-profile, opimize the next top problem from NYTProf. Stop when the code is sufficiently fast.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
Oh yes, Devel::NYTProf is a fantastic tool!
Just for completeness, you can examine the OP codes generated for different snippets of Perl code via B::Terse.
For example:
> perl -MO=Terse -e "sprintf q{%04d}, $_" (Windows)
> perl -MO=Terse -e 'sprintf q{%04d}, $_' (Unix)
LISTOP (0x254e078) leave [1]
OP (0x2753080) enter
COP (0x254e0b8) nextstate
LISTOP (0x254e118) sprintf [2]
OP (0x254e158) pushmark
SVOP (0x254e1f8) const [3] PV (0x254f200) "%04d"
UNOP (0x254e188) null [14]
PADOP (0x254e1c0) gvsv GV (0xd6af50) *_
-e syntax OK
> perl -MO=Terse -e "qq{$i$j$k$l}" (Windows)
> perl -MO=Terse -e 'qq{$i$j$k$l}' (Unix)
LISTOP (0x2806210) leave [1]
OP (0x28061e0) enter
COP (0x2806250) nextstate
UNOP_AUX (0x28062b0) multiconcat [8]
OP (0x28062f0) null [3]
UNOP (0x664fe0) null [14]
PADOP (0x665018) gvsv GV (0x65fbb0) *i
UNOP (0x664f70) null [14]
PADOP (0x664fa8) gvsv GV (0x6549e8) *j
UNOP (0x664ec0) null [14]
PADOP (0x664ef8) gvsv GV (0x65ff10) *k
UNOP (0x2806360) null [14]
PADOP (0x2806398) gvsv GV (0x654b68) *l
-e syntax OK
> perl -MO=Terse -e "$i.$j.$k.$l" (Windows)
> perl -MO=Terse -e '$i.$j.$k.$l' (Unix)
LISTOP (0x27d23b0) leave [1]
OP (0x27d2380) enter
COP (0x27d23f0) nextstate
UNOP_AUX (0x27d2450) multiconcat [7]
UNOP (0x25d2d70) null [14]
PADOP (0x25d2da8) gvsv GV (0x25c4d78) *i
UNOP (0x25d2d00) null [14]
PADOP (0x25d2d38) gvsv GV (0x25c49b8) *j
UNOP (0x25d2c50) null [14]
PADOP (0x25d2c88) gvsv GV (0x25d01b0) *k
UNOP (0x27d2490) null [14]
PADOP (0x27d24c8) gvsv GV (0x25c4e38) *l
-e syntax OK
No great surprise to see qq{$i$j$k$l} and $i.$j.$k.$l generating the same OP codes under the covers.
Updated: added Unix versions of perl command lines
| [reply] [d/l] [select] |
|
For cheap thrills, assuming all you need are the file names 0000 .. 9999 in order, I benchmarked the three offered solutions as shown below.
WoW!!!
Isn't it amazing what random things can be learnt in the Monastery?
I had no idea it is so easy to benchmark different ways of doing the same thing...although, thinking about it, it's pretty certain that someone would have thought about doing it and created a module to make it easy.
That is one of the beauties of Perl - we are standing on the shoulders of giants (or at least, following the CPAN road).
| [reply] |
|
Thanks especially for that. I really appreciated the benchmarking (and seeing how to benchmark). Thanks to Marshall, too. I have taken on all of the comments about efficiency, and agree that the exact circumstances deserve most of the weight when considering various alternatives (but I also don't want to "learn" bad techniques in the first place. :) ).
| [reply] |
Re^3: looping efficiency
by LanX (Saint) on Dec 30, 2020 at 02:46 UTC
|
main::(-e:1): 0
DB<1> $_='0000'
DB<2> p $_++
0000
DB<3> p $_++
0001
DB<4> p $_++
0002
DB<5> $_='abcd'
DB<6> p $_++
abcd
DB<7> p $_++
abce
DB<8> p $_++
abcf
| [reply] [d/l] |
|
| [reply] |
|
I’m going to repeat the excellent advice you have already heard. :P
Dealing with the FS and IO is drastically slower than all of these approaches. So you should not, arguably, be choosing by speed or cleverness but for legibility and future maintenance. Choosing by speed will not be noticed unless you are processing billions of files daily. I adore and use features like string incrementing in Perl. I probably wouldn’t do it at work, maybe, because it’s pretty idiomatic. That said, it’s also the fastest–
Rate glob unpack split n_nested sprintf inc
+rement
glob 78.0/s -- -23% -23% -85% -87%
+ -90%
unpack 101/s 29% -- -0% -80% -84%
+ -87%
split 101/s 30% 0% -- -80% -84%
+ -87%
n_nested 513/s 558% 409% 407% -- -18%
+ -36%
sprintf 623/s 698% 517% 515% 21% --
+ -22%
increment 797/s 922% 690% 687% 55% 28%
+ --
| [reply] [d/l] |
|
Re^3: looping efficiency
by Marshall (Canon) on Dec 30, 2020 at 05:22 UTC
|
I had no idea that you were going to generate 10,000 separate files!
A single SQLite DB file is likely to be far, far more efficient for whatever you are doing. | [reply] |
|
I will have to check that out, but (superficially) I wouldn't think so. Each running of the entire script typically involves 2000-7000 files (not the entire 10,000), but the files range from 3 MB to over 5 Mb, and that would make a fairly massive database. The secondary processing of the files after writing takes advantage of the sequential numbering, and the operations involved don't really lend themselves to database lookups (and involve other external programs). I typically (in the second stage) process subsets 400 to 1000 of the files at a time just to keep the final output files (which involve combining the data in the original files) to a reasonable size.
| [reply] |
|
Geez, as a middle point, 5,000 files times 4 MB each is 20,000 MB => 20 GB. You are writing that much data to the file system in the first place. Your app is much bigger than I thought. But a DB can handle that, even SQLite for 64 bit processor. The processing that you do with this data is unclear as well as the size of the result set.
| [reply] |
|
|
|
|
|