Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hi misterperl,
I don't think that my previous "how to do it" post in this thread didn't adequately answer adequately answered your question.
You wrote: "I guess what I'm really asking is, is there a perlVar or function that is the TRUE array of hash key and one of values that,when changed, changes the keys or values?"
The answer is "no". There is no TRUE array of hash keys.

I will attempt to give a "short" answer to your "why not?" question. There may be some slight technical inaccuracies in pursuit of brevity. Also I will need some "C lingo" to explain what a Perl hash table actually "is", "under the covers".

The first step in the hashing process is to calculate what is called the "binary hash value". This is a single unsigned binary number. Each character in the "ASCII hash key" is fed into essentially an equation and as I said, the end result of this calculation is a single unsigned binary number. This calculation is designed to be very fast (and it is). This is not a simple "checksum" - it is more analogous to a CRC calculation.

A Perl hash starts out with 8 of what are called "buckets". Thus, the bucket array starts out at length 8. Each element of this bucket array is either not used (NULL pointer) or is a C pointer to a linked list of C structures. The C structure will contain things like the actual full ASCII key name that you see as the user of the hash and the the user accessible value of that key. I don't remember if it contains the originally calculated "binary hash value" for each particular ASCII key string or not.

Ok, to use this newly created hash with 8 "buckets", Perl uses only the least significant 3 bits of the "binary hash value" (000-111 binary). This is used as an array index into the bucket array. So when you have your write something like $hash{'user'}='XXZZY";, Perl calculates the "binary hash value" for "user", that value is "masked" so that only the least significant 3 bits are used. Perl looks at the "C linked list" associated with that "bucket". If there is no list there already, a new linked list is started with this new value. What happens if there is already a linked list at that array index? Perl has to look at each element of this C linked list to make sure that it is not modifying something that is already there. If an element with hash key, "user" is not there already, it is added to the end of the linked list.

If the hash structure stayed at only 8 "hash buckets" that wouldn't be very useful for a hash table with say 1,000 entries! So Perl has an algorithm to decide when to double the size of the hash. When the hash size doubles, Perl creates a new "bucket array" of say 16 values and then uses the least significant 4 bits of the "binary hash value" as an index into that new C array.

Every element of the original 8 bucket hash has to be re-visited to see what new bucket it falls into now! Once an additional bit of the "binary hash value" is exposed and used, that changes things!

If you are "with me" so far, making a hash key "a shorter ASCII string" will not necessarily decrease the size of the internal hash. It may even cause the size of the hash to double! This new "shorter key" has to be moved to the "right bucket". If that potentially different bucket is "over used", Perl may double the size of the hash to take care of that.

In modern Perl versions, the calculation of the "binary hash value" has a "per run" random fudge factor. This was done to prevent some security issues with having a repeatable "binary hash value". I don't understand all of the security issues, but this is what it is.

The "usage of a Perl hash" is accessible by the scalar value of the hash. This is a string, like "3/8" or "2/8". This means that either 3 or 2 buckets are used out of 8 possible buckets. I have updated my "how to do it" post with code that shows that. I get randomly either 3 or 2 on multiple succesive runs of the same code. This is as expected.

It is possible to "pre-size" a Perl hash, by my %hash=1000; That will round the number of "buckets" up to 1024 ( binary power of 2 number). This prevents overhead associated with the hash doubling, 8,16,32...1024. However, in my testing, this doesn't matter at all provided that you are doing something significant with the hash once it is created. I was working with hash sizes of 100K entries and above. Perl is very efficient in how it manages its internal hash tables. Just use the features of the language!

Note/Update 1: Doubling the hash size is actually not a significant memory use. This only doubles the array of pointers to C's linked list of structures. But in order to do this, Perl has to revisit each element in the hash and decide where it needs to go next. However as I mentioned before, Perl is very, very efficient about how it does a "doubling of the internal hash size".

Again, there is no "TRUE list" of hash keys. To generate keys %hash, Perl has to traverse the entire internal hash table and return a list of the results to you. But again, Perl does this very, very quickly.

I post this a separate response instead of an update to my original response because the content and subject matter is completely different.


In reply to Re: can I change hash keys or values directly by Marshall
in thread can I change hash keys or values directly by misterperl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-03-28 19:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found