Re: Code Efficiency: Trial and Error?
by dws (Chancellor) on Oct 11, 2002 at 20:32 UTC
|
What I'm interested to know is ...
- how do other Monks go about optimising their code?
Until I know that there's really a problem, I don't.
If there is a problem, then measure before making any code changes. Sometimes the optimization is where you think it is, but much of the time it won't be. Then measure again.
- when faced with more than one obvious way to do something, what's the determining factor?
Pick the way that's clearer. Code has multiple audiences: one audience is the computer, another is the person who picks up the code next. I try to make sure things are correct for the former, and clear and concise for the latter. If you write all of your code with the assumption that it might be tacked up on a wall for passers by to comment on, you won't go too wrong.
| [reply] |
Re: Code Efficiency: Trial and Error?
by runrig (Abbot) on Oct 11, 2002 at 20:54 UTC
|
- how do other Monks go about optimising their code?
- when faced with more than one obvious way to do something, what's the determining factor?
- surely this has to be more than "trial and error" ..? *grin*
When optimizing, you have to ask yourself "How much time will this really save?", and weigh that against "How maintainable is each method?".
And then there's "How much time do I have to play with this?" :-)
I was recently trying to optimize some homegrown Korn shell menu scripts of ours where at one point the user had to wait for about 10 seconds for a prompt (I did end up getting it down to 2-3 seconds). There was one function which was looking for the alias (filename) for a certain customer code (last field on a certain line). The function was doing this: grep " $code$" * | awk -F: '{print $1}'
So it was grep'ing through all the files in a directory even though it could quit when it hit the first matching line. So I rewrote it as this (also realizing that the target line always started with 'Customer Code'):awk '/^Customer Code.* '$code'$/{print FILENAME; exit}' *
But when you hit the 'Customer Code' line in one file, and it's not the one you want, you'd like to close the file right there and move on to the next file, especially because the 'Customer Code' line was always the second or third line in a 40-100 line file. gawk has a 'nextfile' function which does this, but I'm stuck with awk for now. So let's try perl: perl -e '
$site=shift;
opendir DIR, ".";
@ARGV = readdir DIR;
closedir DIR;
while(<>) {
if (/^Customer Code.* (\w+)$/) {
print("$ARGV\n"),exit if $site eq $1;
close ARGV;
}
}' $code
This on the average, goes twice as fast as the original, but at the cost of readability (especially since no one else here knows perl all that well). And then it turns out that this function was not even called during that particularly slow prompt, and was only being called once per execution (in another place), so I'd be saving a whole 0.03 seconds (which the user wouldn't even notice) by doing this with perl. But I'm leaving the perl in for now, along with the old line commented out, with a comment to the effect of "it's alot more fun doing it this way" :-)
Update: As a (hopefully) final note, even though the above code wasn't slowing down the especially slow prompt, I did finally speed up that slow prompt to being almost instantaneous by replacing the offending section with perl. The problem was that there were about 90 'customer' files, and it was fork/exec'ing grep/awk/etc for each file, so I just read each file, saved what I needed in a hash array, and printed the thing out at the end: site_list=$(
perl -e '
while (<>) {
if ($ARGV eq "/etc/uucp/Systems") {
$system{$1} = undef if /^(\w+)/;
next;
}
close ARGV, next unless exists $system{$_};
$system{$ARGV} = $1, close ARGV if /^Customer Code.*\s(\w+)\s*$/
+;
}
$,=" "; print sort(values %system),"\n";
' /etc/uucp/Systems *)
So some things are worth optimizing for. It saves only about 3 seconds (10 from the original) in actual time, but the annoyance it saves is priceless :-) | [reply] [d/l] [select] |
|
|
I wouldn't have used perl or awk, but stayed with grep:
grep -l would have done the same as your
Perl script. But likely to be more efficient.
Abigail
| [reply] [d/l] |
|
|
I like the simplicity of grep -l though as Aristotle points out, it still scans all files (though its probably what I'll end up with just for the sake of maintenance, and '-l' short circuiting the match within each file is 'good enough'). If I just look for /_$code$/ then it is about as fast as the perl script when all the files need to be scanned anyway (and perl isn't all that much quicker even when the match occurs within the first few files). But when I change it to "^Customer Code.* $code$" then it is (~3x) slower. grep and sed are good at very simple regexes, but perl seems to outperform them when they become even mildly complex.
| [reply] [d/l] [select] |
|
|
Update { should have benchmarked properly, apparently.. rest of this post largely invalidated by runrig's reply. }
Abigail-II's proposition of grep -l does not fit your specs as it will still scan all files, but your Perl can be simplified.
perl -e '
$site = shift;
for $f (@ARGV) {
local @ARGV = $f;
/^Customer Code/ && last while <>;
/ \Q$site\E$/ && (print("$f\n"), last);
}
' $code *
But why all that? A shortcircuiting sed script to find a customer's code:
sed 's/^Customer Code[^ ] //; t done; d; : done; q;' FILE
Wrap some sh around it and it does the job:
for ALIAS in * ; do
[ "`sed 's/^Customer Code[^ ]* //; t done; d; : done; q' "$ALIAS"`
+" = "$code" ] && break
done
echo "$ALIAS"
(Wrote this in bash, not sure if it works 1:1 in Korn, but it should be easy to port anyway.)
Update: Btw: when you're passing {print $?} to awk, it's a sign you really wanted to use cut - in your case, that would be
cut -d: -f1
Makeshifts last the longest. | [reply] [d/l] [select] |
|
|
I didn't do the file globbing outside of the perl script because that seemed to be slower, and a readdir solution seemed to be faster than a glob inside of the script. I corrected and rewrote your shell solution for Korn:
for ALIAS in *
do
[[ $(sed -n 's/^Customer Code.* //
t done
d
: done
p
q' "$ALIAS") = $code ]] && break
done
print $ALIAS
But this seems to be the slowest solution of all. Probably due to having to fire up a sed process so many times, and maybe also due to the specific regex. But like I say in my reply to Abigail-II's post above, I'll probably end up with grep -l just for the simplicity of it.
Update: In addition to Aristotle's updated note's, another sign is that if you pipe grep to or from sed, awk, & maybe even cut, you are probably doing too much work, and may be better off just using one of the aforementioned commands instead. | [reply] [d/l] [select] |
Re: Code Efficiency: Trial and Error?
by JaWi (Hermit) on Oct 11, 2002 at 20:38 UTC
|
| [reply] |
Re: Code Efficiency: Trial and Error?
by Aristotle (Chancellor) on Oct 12, 2002 at 01:32 UTC
|
Three rules govern the way I write my code:
- Pick the best / most appropriate algorithm
- Come up with an natural representation using Perl's data types
- Write code that's as clear and self-explanatory as possible
Note that "clear and self-explanatory" doesn't mean "clear and self-explanatory to someone who's never seen Perl before".
I make use of common "magic" features like $_ (which is only the simplest example) extensively. I use next / last to avoid indentations. I use statement modifiers a ton. I use the ternary operator quite frequently.
But not to shave a couple of characters off: to shave a couple of red tape tokens off, and thus baring the real goings-on to the eye. All my indentation and other style rules follow this maxime. I try to lay out my code such that I can absorb the structure while scrolling without even really reading.
Makeshifts last the longest. | [reply] |
Re: Code Efficiency: Trial and Error?
by oakbox (Chaplain) on Oct 11, 2002 at 20:57 UTC
|
I find that I stick to a few simple constructs: foreach over for or while, if-else in brackets over the ?: notation, etc. I write code faster when I'm using tools that I am very familiar with and only jump to other coding styles when there is a real need for optimization.
The main criteria is consistency. If you are consistent in your style, you can go back to your own code and immediately grok what's going on. Consistency also helps other people reading you code.
The question wasn't 'correct vs. bad' coding. If you are consistently writing bad code, then you need to change :) But if the choice is between 'good' and 'also good' consistency wins.
oakbox | [reply] |
|
|
Well... I often ask myself the same question. I often do things in a very readable way, that works well, no bugs etc... And then my boss will come, hack the code, make it two times shorter, and result in a faster more efficient way. Often not as readable though. I also often wonder when building regexes if Im using the fastest way. It would be nice to have examples of two regexes doing the same thing but one faster than the other, and explain why it is faster, so that we can understand how to write more efficient code.
| [reply] |
|
|
| [reply] |
Re: Code Efficiency: Trial and Error?
by ignatz (Vicar) on Oct 11, 2002 at 20:34 UTC
|
How do you get to Carnegie Hall?
()-()
\"/
`
| [reply] |
Re: Code Efficiency: Trial and Error?
by trs80 (Priest) on Oct 12, 2002 at 16:04 UTC
|
Q1) how do other Monks go about optimizing their code?
Q2) when faced with more than one obvious way to do something, what's the determining factor?
Q3) surely this has to be more than "trial and error" ..? *grin*
A1) Since reading between the lines it seems you are working with a database most of the time, there is much more to getting efficient code then just the Perl part of it. You need to be concerned with the network (if it is a remote database), the indexing of the tables, this can make a big difference on the speed of your code. In one recent project I got a ten fold difference in speed on a report by adding the appropriate indexes to the tables. As far as Perl efficiency is concerned this also very dependent on the type of data you are dealing with. I recommend you use a profiler to find out where the most time is being spent in you code. You might want to checkout Devel::AutoProfiler
.
A2) For me unless there is a known efficiency issue I code to my level as much as possible. I occasionally will use code outside of my comfort zone in the hopes of adding it at some point to my bag of tricks, but some code just doesn't "sound" right and makes more of a maintenance issue then anything else. Another thing to consider is Lifespan of the code in question. If this is a one time or seldom run piece of code you most likely won't recover from the time "wasted" in making more efficient. Optimize only when it is beneficial in the grand scheme of things.
A3) Using different benchmarking tools and code profiling you can determine what is causing the bottleneck and code those sections differently.
| [reply] |