Without knowing why you are putting so many keys in a hash, it's hard to say what to do. (Note that it matters less how many lines there are in the file, as well as the number of different words). One obvious savings you can do is chopping off the newline - that would save you a couple of Mb.

But you might consider using a disk bound datastructure. Perhaps a database, or one of the DB files. A trie was suggested as well, but I'm not sure how much it will save. Obviously, the amount of string data is reduced, but at the cost of introducing more hashes (or arrays), which themselves come with quite a lot of memory overhead. It will depend on the prefix duplication in the data set.


In reply to Re: Out of Memory by JavaFan
in thread Out of Memory by instinct_4ever

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.