I think you hit upon (and summarily eliminated) the best solution: a database. You have 1TB of data. It's stored in flat files? Properly, this data should have never been put into files, but directly into a database, and you would simply query for the data you want as you want it. Yes, it takes up a fair bit more space than flat files. But you're trading space for speed. Disk space is cheap, CPU speed not so cheap.

You say you're using an AMD64 machine. Are you running a 64-bit OS on it? If not, you may want to try that first - that may help your I/O speed somewhat, and probably will help your ability to use all your memory.

Once you're using a 64-bit OS, it's time to get a 64-bit perl. With a 32-bit perl, you'll run out of address space long before you can load your data in.

Finally, then you can get a 64-bit database. I know, I know, I'm harping on this. But, let's face it. You have 1TB of data you're trying to work with, but only 2GB of RAM. The other 998GB of data will simply get swapped out to disk while you're loading it from disk. This is going to be incredibly!!!! slow. Use a database - it has algorithms and code that are intended to deal with this type of problem, written in highly-optimised (if you believe the TPC ratings) C code. Load data as you need it, discard it when you're done with it. Put as much logic as you can into your SQL statements, let the database handle getting your data in the most efficient manner possible.

I really, honestly, think that if you cannot afford the database storage, you can't afford any solution. Storage is relatively cheap, and trying to load everything into memory is simply going to fail. The Tie::* modules are likely your next best bet as they probably also load/discard data as needed, allowing you to live within your 2GB of RAM.


In reply to Re^3: How do I measure my bottle ? by Tanktalus
in thread How do I measure my bottle ? by cbrain

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.