in reply to Berkeley DB performance, profiling, and degradation...
I've followed the advice of perrin and crazyinsomniac, and switched to BerkeleyDB (which made things just a touch slower than the original DB_File based program--but not enough to worry over). I then followed Randal's advice to lose the tie interface (my reason for sticking with BerkeleyDB rather than reverting to DB_File is that the direct API for BerkeleyDB is better documented and a bit more complete).
Dropping the reliance on a tied hash for db entries gave me another 10% speed boost! So, the numbers now look like this for the 'long test' (which is still only a small percentage of the full working set):
Total Elapsed Time = 2010.809 Seconds User+System Time = 1889.459 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 41.8 790.8 2354.8 105111 0.0075 0.0224 main::add_entry 32.9 623.2 621.75 618805 0.0010 0.0010 File::QuickLog::print 15.4 291.0 290.92 55361 0.0053 0.0053 BerkeleyDB::Common::db_ge +t 7.20 136.0 135.74 105111 0.0013 0.0013 BerkeleyDB::Common::db_pu +t 1.92 36.27 1886.0 49750 0.0007 0.0379 main::process_file 0.45 8.574 5684.3 242 0.0354 23.489 main::recurse_dir 0.40 7.528 7.361 56262 0.0001 0.0001 main::find_parent 0.31 5.858 5.545 105111 0.0001 0.0001 Digest::MD5::md5_hex 0.01 0.160 0.160 1 0.1600 0.1600 main::get_cache_dirs 0.01 0.160 0.878 5 0.0320 0.1756 main::BEGIN 0.01 0.150 0.509 1 0.1498 0.5087 IO::import 0.01 0.110 0.109 261 0.0004 0.0004 File::QuickLog::_datetime 0.01 0.100 0.120 64 0.0016 0.0019 Exporter::import 0.00 0.080 0.110 6 0.0133 0.0183 IO::Socket::BEGIN 0.00 0.070 0.070 1 0.0700 0.0700 BerkeleyDB::Term::close_e +verything
So we've shaved 232 seconds off of the total elapsed runtime...a bit more than a 10% gain. 623 seconds of that is logging calls, all but ~100 seconds of which can be stripped out for the final version. More important, of course, for the long haul, is that the sec/call for all database related functions is reduced by about 15%--I'm hoping this means that the degradation of DB performance is significantly lessened by the use of direct calls rather than a tied interface.
I'm starting a full index run right now, so hopefully by tomorrow night I'll have more to say on the subject. Here's hoping for 15 entries/sec sustainable...
Randal gets my thanks this time, for providing the most useful suggestion in this round of tweaks.
|
|---|