Further along in the journey, but still haven't made it home:

I've followed the advice of perrin and crazyinsomniac, and switched to BerkeleyDB (which made things just a touch slower than the original DB_File based program--but not enough to worry over). I then followed Randal's advice to lose the tie interface (my reason for sticking with BerkeleyDB rather than reverting to DB_File is that the direct API for BerkeleyDB is better documented and a bit more complete).

Dropping the reliance on a tied hash for db entries gave me another 10% speed boost! So, the numbers now look like this for the 'long test' (which is still only a small percentage of the full working set):

Total Elapsed Time = 2010.809 Seconds User+System Time = 1889.459 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 41.8 790.8 2354.8 105111 0.0075 0.0224 main::add_entry 32.9 623.2 621.75 618805 0.0010 0.0010 File::QuickLog::print 15.4 291.0 290.92 55361 0.0053 0.0053 BerkeleyDB::Common::db_ge +t 7.20 136.0 135.74 105111 0.0013 0.0013 BerkeleyDB::Common::db_pu +t 1.92 36.27 1886.0 49750 0.0007 0.0379 main::process_file 0.45 8.574 5684.3 242 0.0354 23.489 main::recurse_dir 0.40 7.528 7.361 56262 0.0001 0.0001 main::find_parent 0.31 5.858 5.545 105111 0.0001 0.0001 Digest::MD5::md5_hex 0.01 0.160 0.160 1 0.1600 0.1600 main::get_cache_dirs 0.01 0.160 0.878 5 0.0320 0.1756 main::BEGIN 0.01 0.150 0.509 1 0.1498 0.5087 IO::import 0.01 0.110 0.109 261 0.0004 0.0004 File::QuickLog::_datetime 0.01 0.100 0.120 64 0.0016 0.0019 Exporter::import 0.00 0.080 0.110 6 0.0133 0.0183 IO::Socket::BEGIN 0.00 0.070 0.070 1 0.0700 0.0700 BerkeleyDB::Term::close_e +verything

So we've shaved 232 seconds off of the total elapsed runtime...a bit more than a 10% gain. 623 seconds of that is logging calls, all but ~100 seconds of which can be stripped out for the final version. More important, of course, for the long haul, is that the sec/call for all database related functions is reduced by about 15%--I'm hoping this means that the degradation of DB performance is significantly lessened by the use of direct calls rather than a tied interface.

I'm starting a full index run right now, so hopefully by tomorrow night I'll have more to say on the subject. Here's hoping for 15 entries/sec sustainable...

Randal gets my thanks this time, for providing the most useful suggestion in this round of tweaks.


In reply to Re: Berkeley DB performance, profiling, and degradation... by SwellJoe
in thread Berkeley DB performance, profiling, and degradation... by SwellJoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.