Answer to question (I):

I don't think it's a problem for Perl to handle this size. The problem could be your computer (and may be your patience).

A hash needs as far as I remember a minimum of 55 bytes per hash key + the content. 11 million times 55 = 605 million bytes = 576 MByte for the overhead... Do you have enough memory for the whole file + overhead + OS + ...? So when your hardware doesn't have enough memory it's swapping and that can make your solution unacceptable slow.

Question (II): I assume that you are no database expert, are you? 11 million records is no problem at all (depending on your table definition), but your data may be. For example if your sequence just fits in a certain context. So that NAME2 is to be interpreted different depending on the record NAME1 before and/or the record NAME3 after. But if this would be the case an hash isn't the right solution, too.

Just bringing these data in your database may be just half of the solution. The full answer depends on your way to find these 100, 10 or 45 records. Your reading algorithm should be good enough to read the database in a rational way meaning the Perl script doesn't read the records a zillion times... A database is no solution, too, if you just read the file sequentially (a "SELECT * from table;") compute your results then somehow and pick your 100, 10 or 45 records. The overhead for storing your data in the database and retrieving it could be killing all wins by using a database.

Marshall proposed SQLite. This is really a good solution, especially if you work with a limited number of processes accessing the data base at the same time. Personally I wouldn't take MySQL, I'd prefer PostgreSQL, but this is just my opinion.

In reply to Re: Efficient way to handle huge number of records? by leuchuk
in thread Efficient way to handle huge number of records? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.