Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Greetings all;

As of late I've been fiddling with a rather simple AI chatterbot. Nothing remarkably clever, learns from input, generates sentences from keywords, so on, so forth. This particular bot has been around for a couple of months now and has managed to learn a great many things.
Recently I queried why it was taking up so much memory. Now I have a better understanding, and as its grown, I realise that I most certainly do need to attempt to use something different for storing the data in memory.

The data consists of two large hash of hashes, linking a keyword to potential following symbols and a weighting count. On disk, uncompressed, this takes up some 6.5M for some 120,000 possible word combinations. In memory this takes up 70M. Thats a big difference.
My understand is that perl, in its infinite and speedy wisdom, preallocates memory when it generates the hash, so when you add elements, you aren't constantly reallocating memory. As the word base has grown, it's become apparent that learning slows as he sees fewer and fewer things that he doesn't already know. Logical. This means he infrequently is adding new things to his data set. More importantly, of the 120,000 hashes, some 90% of them have less than 4 entries... Thats a lot of wasted preallocated memory.
So what I'm wondering is, is there an existing hash structure (That I can use, ala tie()) that doesn't preallocate? This way I'm only allocating memory that I actually need? I can stand the tradeoff in speed, right about now.

If this doesn't already exist, what suggestions do you have for building my own? (I'd figure I'd rip-off the existing built-in perl code for doing hashes and just have it not preallocate)...

My thanks
JP Hindin
-- Alexander Widdlemouse undid his bellybutton and his bum dropped off --


In reply to A memory efficient hash, trading off speed - does it already exist? by JPaul

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-20 04:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found