This would solve some of the bloat, but probably not too much of the speed decrease that you've seen.

What I think you're running into is a basic shortcoming of hashes in general, and Perl in particular. What happens is that each hash key is internally "hashed" into a 4-byte integer value. And these keys are then put in a sorted list, so you can quickly perform a binary search on it to check whether the key already exists or not.

Ideally every different key should have a different hash value. Reality however shows that with huge amounts of keys, the chance of two different key values getting the same hash value, is actually quite large. This situation is then handled by creating a list of all the keys that have that same hash value. In some "worst case" scenarios, you might actually wind up with a single hash value for all keys given.

The search in the linked list is linear and therefore needs more and more time for each key added.

Now, the distribution of the hash keys in your dataset may be have some worst cases in it.

So how can this be fixed? Good question. You could try changing the keys by adding some other, fixed string to the keys. This should change the distribution of the hash key values, hopefully reducing the worst case.

If you're not stuck with the current Perl version, you might want to try out 5.8.1-RC1, as this has a new feature that will randomize the distribution of the hash keys. This in itself will not reduce the worst case scenario, but will make it a moving target, so you won't have that bad performance all of the time.

This randomization was introduced because the delay that you experienced, is actually a way to create a Denial of Service attack for CGI-scripts.

Hope this helps.

Liz


In reply to Re: Re: Slowness when inserting into pre-extended array by liz
in thread Slowness when inserting into pre-extended array by ryangabbard

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.