in reply to Code Efficiency: Trial and Error?


- how do other Monks go about optimising their code?
- when faced with more than one obvious way to do something, what's the determining factor?
- surely this has to be more than "trial and error" ..? *grin*

When optimizing, you have to ask yourself "How much time will this really save?", and weigh that against "How maintainable is each method?".

And then there's "How much time do I have to play with this?" :-)

I was recently trying to optimize some homegrown Korn shell menu scripts of ours where at one point the user had to wait for about 10 seconds for a prompt (I did end up getting it down to 2-3 seconds). There was one function which was looking for the alias (filename) for a certain customer code (last field on a certain line). The function was doing this:

grep " $code$" * | awk -F: '{print $1}'
So it was grep'ing through all the files in a directory even though it could quit when it hit the first matching line. So I rewrote it as this (also realizing that the target line always started with 'Customer Code'):
awk '/^Customer Code.* '$code'$/{print FILENAME; exit}' *
But when you hit the 'Customer Code' line in one file, and it's not the one you want, you'd like to close the file right there and move on to the next file, especially because the 'Customer Code' line was always the second or third line in a 40-100 line file. gawk has a 'nextfile' function which does this, but I'm stuck with awk for now. So let's try perl:
perl -e ' $site=shift; opendir DIR, "."; @ARGV = readdir DIR; closedir DIR; while(<>) { if (/^Customer Code.* (\w+)$/) { print("$ARGV\n"),exit if $site eq $1; close ARGV; } }' $code
This on the average, goes twice as fast as the original, but at the cost of readability (especially since no one else here knows perl all that well). And then it turns out that this function was not even called during that particularly slow prompt, and was only being called once per execution (in another place), so I'd be saving a whole 0.03 seconds (which the user wouldn't even notice) by doing this with perl. But I'm leaving the perl in for now, along with the old line commented out, with a comment to the effect of "it's alot more fun doing it this way" :-)

Update: As a (hopefully) final note, even though the above code wasn't slowing down the especially slow prompt, I did finally speed up that slow prompt to being almost instantaneous by replacing the offending section with perl. The problem was that there were about 90 'customer' files, and it was fork/exec'ing grep/awk/etc for each file, so I just read each file, saved what I needed in a hash array, and printed the thing out at the end:

site_list=$( perl -e ' while (<>) { if ($ARGV eq "/etc/uucp/Systems") { $system{$1} = undef if /^(\w+)/; next; } close ARGV, next unless exists $system{$_}; $system{$ARGV} = $1, close ARGV if /^Customer Code.*\s(\w+)\s*$/ +; } $,=" "; print sort(values %system),"\n"; ' /etc/uucp/Systems *)
So some things are worth optimizing for. It saves only about 3 seconds (10 from the original) in actual time, but the annoyance it saves is priceless :-)

Replies are listed 'Best First'.
Re: Code Efficiency: Trial and Error?
by Abigail-II (Bishop) on Oct 14, 2002 at 12:39 UTC
    I wouldn't have used perl or awk, but stayed with grep: grep -l would have done the same as your Perl script. But likely to be more efficient.

    Abigail

      I like the simplicity of grep -l though as Aristotle points out, it still scans all files (though its probably what I'll end up with just for the sake of maintenance, and '-l' short circuiting the match within each file is 'good enough'). If I just look for /_$code$/ then it is about as fast as the perl script when all the files need to be scanned anyway (and perl isn't all that much quicker even when the match occurs within the first few files). But when I change it to "^Customer Code.* $code$" then it is (~3x) slower. grep and sed are good at very simple regexes, but perl seems to outperform them when they become even mildly complex.
Re^2: Code Efficiency: Trial and Error?
by Aristotle (Chancellor) on Oct 14, 2002 at 13:39 UTC

    Update { should have benchmarked properly, apparently.. rest of this post largely invalidated by runrig's reply. }

    Abigail-II's proposition of grep -l does not fit your specs as it will still scan all files, but your Perl can be simplified.

    perl -e ' $site = shift; for $f (@ARGV) { local @ARGV = $f; /^Customer Code/ && last while <>; / \Q$site\E$/ && (print("$f\n"), last); } ' $code *
    But why all that? A shortcircuiting sed script to find a customer's code: sed 's/^Customer Code[^ ] //; t done; d; : done; q;' FILE Wrap some sh around it and it does the job:
    for ALIAS in * ; do [ "`sed 's/^Customer Code[^ ]* //; t done; d; : done; q' "$ALIAS"` +" = "$code" ] && break done echo "$ALIAS"
    (Wrote this in bash, not sure if it works 1:1 in Korn, but it should be easy to port anyway.)

    Update: Btw: when you're passing {print $?} to awk, it's a sign you really wanted to use cut - in your case, that would be cut -d: -f1

    Makeshifts last the longest.

      I didn't do the file globbing outside of the perl script because that seemed to be slower, and a readdir solution seemed to be faster than a glob inside of the script. I corrected and rewrote your shell solution for Korn:
      for ALIAS in * do [[ $(sed -n 's/^Customer Code.* // t done d : done p q' "$ALIAS") = $code ]] && break done print $ALIAS
      But this seems to be the slowest solution of all. Probably due to having to fire up a sed process so many times, and maybe also due to the specific regex. But like I say in my reply to Abigail-II's post above, I'll probably end up with grep -l just for the simplicity of it.

      Update: In addition to Aristotle's updated note's, another sign is that if you pipe grep to or from sed, awk, & maybe even cut, you are probably doing too much work, and may be better off just using one of the aforementioned commands instead.