Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Re: A memory efficient hash, trading off speed - does it already exist?

by JPaul (Hermit)
on Feb 04, 2003 at 04:28 UTC ( #232461=note: print w/replies, xml ) Need Help??

in reply to Re: A memory efficient hash, trading off speed - does it already exist?
in thread A memory efficient hash, trading off speed - does it already exist?

Can you explain to me why the structures that store the data take over 10 times the size of the actual data itself? I believe you, I just don't understand why.
What does storing the data in this format provide? Speedy STOREs and FETCHes? Is that it?

I have a friend busily telling me that if this was C he could simply construct a linked-list struct and wander across the structs, which would undoubtedly take up less memory. While I'm sure thats not as magically efficient as he thinks, and I know the lookups would be phenomenally slow, its hard to convince him :)

-- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

  • Comment on Re: Re: A memory efficient hash, trading off speed - does it already exist?

Replies are listed 'Best First'.
Re: Re: Re: A memory efficient hash, trading off speed - does it already exist?
by MarkM (Curate) on Feb 04, 2003 at 05:50 UTC

    The features that Perl data structures provide over minimal C data structures include at least the following:

    1. Reference counting to ensure prompt and reliable garbage collection. Once the reference count hits 0, the structure can be reclaimed.
    2. Data structures provide storage for multiple form values. For instance, if a string is used as an integer, and as a number, further accesses of the string as either a string, an integer or a number do not require for the string to be converted each time.
    3. In-place upgrading of types. For instance, if a type is blessed, or if a hash is tied, or if an integer becomes a string, all references (active references, not the same as Perl scalar references) remain valid. For comparison, consider the C linked list your friend mentions. Now change the list to a b-tree. Will all data structures that referred to the list now be ok referring to the b-tree? What if the b-tree data structure is bigger than the list data structure and requires being moved?

    Another thing to explain to your friend is that the HASH structure is a form of indexed linked list. First, the key is hashed into an integer form, then the integer is used to produce an index into the array of linked lists.

    The overhead for a standard Perl string is at least 24 bytes (on a 32-bit architecture). This is each of: a reference count (4 bytes), flags (4 bytes), a reference to a more specific data structure (4 bytes), a reference to the string content (4 bytes), the length of the string (4 bytes), the space allocated for the string as it is likely less than the length (4 bytes).

    The overhead for a standard Perl hash is at least: (again, for a 32-bit architecture)

    60 bytes + ((20 + length of average key) * number_of_keys)) + (4 * size of array used within hash) + (24 * length of average value, assuming all string values)

    When I referred to the amount you would save for not pre-allocating hash buckets, I was referring to "4 * size of array" number above, which is almost insignificant compared to the rest of the overhead. The above only describes a hash mapping strings to strings. In your case, you are mapping a string to a scalar reference to a hash that maps strings to numbers.

    Your specific case is pushing for title of 'application that best makes Perl look like a pig.' Usually, Perl is able to achieve significant performance gains by using more memory. In your case, the sheer size of the data structures likely results in the opposite effect.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://232461]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2023-05-31 00:49 GMT
Find Nodes?
    Voting Booth?

    No recent polls found