in reply to Optimization of DB_File sorting and processing code

Off the top of my head, You can specify the sort for the DB_File (using DB_BTREE), that would elimate the sort (keys (%hash)).

-Lee

"To be civilized is to deny one's nature."

Replies are listed 'Best First'.
Re: Re: optimize this code?
by crazyinsomniac (Prior) on Feb 09, 2002 at 14:18 UTC
    ++ Right on! Not only would it eliminate the sort, the resulting data structure would be no glorified hash (tilly don't consider a balanced tree one, I'm agreeing after a few /tell's), but a fulll fledged sorted balanced binary tree (extra efficiency in lookups).

    And now, my favorite activity besides programming, quoting from the pod:

    DB_BTREE

    The btree format allows arbitrary key/value pairs to be stored in a sorted, balanced binary tree. As with the DB_HASH format, it is possible to provide a user defined Perl routine to perform the comparison of keys. By default, though, the keys are stored in lexical order.

    update
    as for building the matrix, see what japhy said (it's got to do with how perl grows the array).

    The BTree will definetly help, but I'm not sure how noticably ... you might wanna look at the DB_Fille pod for an alternate method of accessing the key/value pairs (like the seq method), or use the BerkleyDB module.

    Question: Evanovich, how are you benchmarking this?

    Suggestion: If you, with the help of the monks, don't figure out how to optimize this sufficiently, you might wanna consider a C/C++ solution. BerkleyDB has a C/C++ interface, and your code isn't too complex/long ... please don't hurt me ;)(/duck)

     
    ______crazyinsomniac_____________________________
    Of all the things I've lost, I miss my mind the most.
    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

      Well, what I don't understand Crazyinsomniac, is why this should take so long, to build the matrix I mean. Accessing the data is lightning fast, it's just that looping through the hash that is rough. Question: if I made my data into single entry vector tables, and then looped through the vector and loaded it into the matrix, maybe that would help? I don't want to go to C, because this code is just the beginning of much more complicated stuff, and I need perl's structures I think.
      I commented out the sort, and it doesn't really affect the loop time. Will this help the time of building the matrix, which seems to be taking up the bulk of each loop?