A novice breaks his silence, seeking wisdom in the typing of many monkees...
I am after recommendations for a structural rewrite to improve performance:
priorities are 0) maintainable 1) max speed, 2) min memory. This is in a data-processing
context, not web pages etc.
Problem: read a number of input items (500K items), transform them into
output items and save them into load tables in the database.
Focus: optimal loop structure and transformation.
Current overview
load all items into array of hash
for each input item {
complex transformation rules
}
save all items (timestamped)
remove all items with old timestamp
Some alternatives I am testing for the loop:
OUTER-IF:
if ( $type eq 'apple' ) {
for my $item ( @items ) {
apple_wash($item);
apple_core($item);
apple_pulp($item);
}
} elsif ( $type eq 'banana') {
for my $item ( @items ) {
banana_bend($item);
banana_hang($item);
}
} else {
die "bad fruit: $type";
}
INNER-IF:
for my $item ( @items ) {
if ( $type eq 'apple' ) {
apple_wash($item);
apple_core($item);
apple_pulp($item);
} elsif ( $type eq 'banana') {
banana_bend($item);
banana_hang($item);
} else {
die "bad fruit: $type";
}
}
ARRAY OF SUB-REF:
my @funcs;
if ( $type eq 'apple' ) {
push @funcs, \&apple_wash, \&apple_core, \&apple_pulp;
} elsif ($type eq 'bananas') {
push @funcs, \&banana_bend, \&banana_hang;
} else {
die "bad fruit: $type";
}
for my $item ( @items ) {
for my $func (@funcs) {
$func->($item);
}
}
SYMBOL-TABLE FIDDLE:
if ( $type eq 'apple' ) {
*func1 = \&apple_wash;
*func2 = \&apple_core;
*func3 = \&apple_pulp;
} elsif ($type eq 'bananas') {
*func1 = \&banana_bend;
*func2 = \&banana_hang;
*func3 = \&noop;
} else {
die "bad fruit: $type";
}
for my $item ( @items ) {
func1($item) unless \&func1 == \&noop;
func2($item) unless \&func2 == \&noop;
func3($item) unless \&func3 == \&noop;
}
Specifics: I have profiled some test code with these four approaches -
OUTER-IF is the (slightly) fastest, but unwieldy in the real instance, as there
are hundreds of infrastructural lines omitted from the example that make if hard to
maintain duplicate FOR LOOPs.
INNER-IF is also fast, but results in difficult to maintain and long... code inside the FOR LOOP
ARRAY OF SUB-REF: slower than IFs - must be cost of dereferencing the function ref? but makes huge FOR LOOP mucho clearer.
SYMBOL-TABLE FIDDLE: I am a symbol-table virgin, and my symbol-table stuff looks poor - are there
better sym-t syntacies? Generates 'Redefined function XXX' warnings.
Oh monks of Perl, I seekest thy wisdom...
TIA
Jeff
edited: Tue Dec 17 15:23:04 2002
by jeffa - title truncation (was: performance - loops and complex decisions, sub refs, symbol tables, inner and outer if/elses)
update 2 (broquaint): added <readmore> tag
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.