in reply to Re: Benchmarking A DB-Intensive Script
in thread Benchmarking A DB-Intensive Script

Hi tilly,

Thank you: this is really sound advice. In a way this is common-sense, but it was common-sense I hadn't been applying.

I first quantified just "how much too slow" the overall program is. It's averaging about 75 seconds per 1000 records, but I'm aiming for 24s so that gave me a target.

Next, I looked at CPU usage, and indeed it *is* climbing quite slowly -- indicating a DB or network problem. I checked for indices, and all seemed okay so I simply copied the relevant DB tables to the machine running this analysis. BOOM: averaging 49 seconds/1000k.

So big picture thinking got me halfway there. I'm now profiling to see just where the code is going slow to see if it's worth trying to load the property-tables into memory or not.

Good solid advice, thanks. ++tilly.

  • Comment on Re^2: Benchmarking A DB-Intensive Script

Replies are listed 'Best First'.
Re^3: Benchmarking A DB-Intensive Script
by tilly (Archbishop) on Mar 14, 2006 at 22:36 UTC
    You're close enough that understanding the big picture may get you the rest of the way there as well. :-)

    Try running two copies of the script at the same time. See how fast they go. Then three. Then four. Find the point where you don't run faster by running more copies.

    Each one will write the examples it finds to its own file. It is trivial to afterwards go and remove duplicates from those files. (Sort each pair in each line alphabetically so that a pair always is on a line that looks the same. Then do a sort -u to find and remove duplicates.)