comment on

Greetings all;
I have a piece of code I've been fiddling around with thats designed to emulate natural speech, learning from users input. (Very simply, a learning chatterbox).

I've been surprised by how much memory the data takes up, given how small it is when written to disk. I use twin hashes, storing practically the same data, but in a different order. The script learns a sentence in two directions (front to back, back to front) so it can generate a sentence in either direction from a given keyword.
Right now each hash, on disk, takes up 727k (1.4M "brain") - but when loaded into the hash, takes up a remarkable 16M! (I've loaded the software without data to verify).
My hash is put together like so:

$VAR1 = {
          'Word1_Word2' => {
                             'Sym1' => 3,
                             'Sym2' => 1 
                           },           
          'Word3_Word4' => {
                             'Sym4' => 3,
                             'Sym3' => 1
                           },
          'Word5_Word6' => {
                             'Sym5' => 1
                           }
        };
[download]

For comparison, I write every entry to disk in the format:

Word1 \a Word2 \00 Sym1 \00 3 \n
[download]

Can you fine gentlemonks suggest a better way of storing data in memory, while also being easy to reference?
My thanks,

JP,
-- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

In reply to A more memory efficient storage structure? by JPaul

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks