cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Howdy bros. I have some code I'm looking to speed up, and a java geek friend of mine said I should look into hash optimization, specifically efficient hashing algorithms and hash size optimization (to avoid re-hashing). These things are apparently standard fare in java and C++ etc., but after some looking around I'm not turning up much about it with respect to Perl. Am I not looking in the right places, or is this not something one worries about in Perl?

Thx....Steve

Replies are listed 'Best First'.
Re: Hash optimization in Perl?
by derby (Abbot) on May 01, 2007 at 23:54 UTC

    There are really only two optimizations you can do with perl hashing - speed or size.

    If you know the approximate size of your hash and want to speed up access, you can presize your hash

    keys(%hash) = 1024;
    Given the set of keys you have, that may or may not really speed things up. The theory is that by allocating space up front, you prevent slow downs when the hash needs to be re-allocated.

    For memory size, if you think your hash is going to consume more memory than you have, you should look into the various Tie::Hash modules that have a db backend.

    Either way, both are probably the last places I would look for optimizations.

    -derby

      For in-memory hashes with smaller size requirements, you may try Tie::GHash whose proposal is to offer smaller (but possibly slower) hashes. That could be useful, if your system has available the Gnome glib library from where this module's hashes come from.

      Are you sure keys is an lvalue function?

      I get no complaint in scalar assignment, but no indication it worked, either. List assignment gives,

      $ perl -Mwarnings -Mstrict -e'keys( my %foo) = qw/foo bar baz/;print %foo'
      Useless use of a constant in void context at -e line 1.
      Useless use of a constant in void context at -e line 1.
      Argument "baz" isn't numeric in scalar assignment at -e line 1.
      $ 

      After Compline,
      Zaxo

        perldoc -f keys
        As an lvalue "keys" allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say
        keys %hash = 200;
        then %hash will have at least 200 buckets allocated for it--256 of them, in fact, since it rounds up to the next power of two. These buckets will be retained even if you do "%hash = ()", use "undef %hash" if you want to free the storage while %hash is still in scope. You can't shrink the number of buckets allocated for the hash using "keys" in this way (but you needn't worry about doing this by accident, as trying has no effect).
Re: Hash optimization in Perl?
by GrandFather (Saint) on May 01, 2007 at 23:26 UTC

    It's generally not something to worry about with Perl. Can you provide a stripped down sample of your code that demonstrates where the speed issue is and a description of the context in which the code is used?

    Generally finding an appropriate data structure and algorithm have much more impact on execution time than micro-adjustments such as hash optimization. Your code would have to be doing virtually nothing but accessing the hash for any optimization there to make any measurable difference at all to execution time.

    You may find tools such as Devel::DProf and Devel::FastProf help focus on the areas of code that really need attention.


    DWIM is Perl's answer to Gödel
Re: Hash optimization in Perl?
by Joost (Canon) on May 02, 2007 at 09:52 UTC
    Aside from pre-sizing your hashes as mentioned by derby above, there really isn't much you can do in perl to make hash-access faster. That's different in C++ and Java because those languages implement hashes in C++ and Java respectively. Perl hashes are implemented in (already pretty efficient) C.

    You could - maybe - implement a more efficient hashing algorithm in C that's tailored to your application, but I would be very surprised if it made any significant dent in your application run time.

    In my experience, if you want serious speed ups, and your program speed is really limited by CPU time (instead of IO speed) and your algorithm is already efficient (see also What is "Schwarzian Transform" (aka Schwartzian), for example), you can gain a lot from porting your heavy lifting inner loops over to C. And you probably still will use perl's hashes then.