Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: A memory efficient hash, trading off speed - does it already exist?

by Wysardry (Pilgrim)
on Feb 04, 2003 at 01:23 UTC ( [id://232437]=note: print w/replies, xml ) Need Help??


in reply to A memory efficient hash, trading off speed - does it already exist?

This may be a really dumb idea, but I've used a similar method before (in BASIC) to save memory space, back when 64Kb was considered huge.

Couldn't you store the key/value pairs back to back in a single string, and then pull them out with a regex?

The format could be something like "key1:value1|key2:value2|key3:value3|" allowing you to search for the key, and if found grab it and everything up to the next | character, then split on the : character to get the corresponding value.

You could use whatever separators you wished, but there would need to be two different ones for this to work and to make it more readable for humans.

Unfortunately, I'm not confident enough with the syntax of regular expressions (yet) to give an example, but from the sounds of it you know enough to work that out for yourself and to tell if I'm spouting complete nonsense.

__________
"Every program has at least one bug and can be shortened by at least one instruction -- from which, by induction, one can deduce that every program can be reduced to one instruction which doesn't work." -- (Author Unknown)

  • Comment on Re: A memory efficient hash, trading off speed - does it already exist?

Replies are listed 'Best First'.
Re: Re: A memory efficient hash, trading off speed - does it already exist?
by Mr_Person (Hermit) on Feb 04, 2003 at 01:29 UTC
    Well, but then you run into the problem of if you want to use a colon or a pipe in one of your keys or values. I would immagine it would also be quite a bit slower for large amounts of key/value pairs than using a real hash.

      That's why I said that any separating characters could be used, meaning that ones that didn't clash could be picked. As a Perl program would most likely be used to convert the existing hashes, characters not normally directly accessible on the keyboard (above ASCII 127) could be used.

      The original poster also stated that speed wasn't an issue.

      It wouldn't be practical for every application, but might work in this instance. A similar method was used by several text adventure games on machines with limited memory or no array facilities.

Re: Re: A memory efficient hash, trading off speed - does it already exist?
by JPaul (Hermit) on Feb 04, 2003 at 04:37 UTC
    Indeed I am also considering something similar to this, although I'm not entirely sure if it would work quite as I hope.

    The data is constructed like so:

    %hash{Keyword} -> {Symbol_1} = Weighting %hash{Keyword} -> {Symbol_2} = Weighting
    And so on. I think perhaps I could slim down memory considerably by doing something like the following:
    %hash{Keyword} = "Symbol_1 | Weighting, Symbol_2 | Weighting"
    And, as Mr_Person has already noted, no I don't intend to use | and , as separators. Already when writing data to the disk as a "brainsave" I use \x255 as a record separator, just like you were thinking. Perhaps the fact that I started programming in BASIC as well might have something to do with our shared mindset? :)

    JP,
    -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

      If anything, you should probably switch to a structure like
      $hash{"Keyword\0Symbol_1"} = $Weighting[1]; $hash{"Keyword\0Symbol_2"} = $Weighting[2];

      That way you still get full hash lookup performance on the entire (two dimensional) key. And you avoid creating whole legions of hashes. That should pretty much fix 90% of your problem. And such a flat hash is trivial to tie to a DBM file.

      Don't knock DBMs without benchmarking them. DBM files are specifically laid out to allow for rapid retrieval - paging just indiscriminately swaps out whatever seems least necessary, including other parts of the system besides the data. Drawing any conclusions about one based on the other borders on ridiculous.

      Makeshifts last the longest.

        This requires I know what the following symbol is.
        The point of the hash is to look up potential following symbols of the keyword, unless there's someway to glob hash keys without going through them to compile a list, it's not going to work at all.

        JP,
        -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://232437]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-18 22:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found