in reply to Efficient way to handle huge number of records?
Answer to question (I):
I don't think it's a problem for Perl to handle this size. The problem could be your computer (and may be your patience).
A hash needs as far as I remember a minimum of 55 bytes per hash key + the content. 11 million times 55 = 605 million bytes = 576 MByte for the overhead... Do you have enough memory for the whole file + overhead + OS + ...? So when your hardware doesn't have enough memory it's swapping and that can make your solution unacceptable slow.
Question (II): I assume that you are no database expert, are you? 11 million records is no problem at all (depending on your table definition), but your data may be. For example if your sequence just fits in a certain context. So that NAME2 is to be interpreted different depending on the record NAME1 before and/or the record NAME3 after. But if this would be the case an hash isn't the right solution, too.
Just bringing these data in your database may be just half of the solution. The full answer depends on your way to find these 100, 10 or 45 records. Your reading algorithm should be good enough to read the database in a rational way meaning the Perl script doesn't read the records a zillion times... A database is no solution, too, if you just read the file sequentially (a "SELECT * from table;") compute your results then somehow and pick your 100, 10 or 45 records. The overhead for storing your data in the database and retrieving it could be killing all wins by using a database.
Marshall proposed SQLite. This is really a good solution, especially if you work with a limited number of processes accessing the data base at the same time. Personally I wouldn't take MySQL, I'd prefer PostgreSQL, but this is just my opinion.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Efficient way to handle huge number of records?
by Anonymous Monk on Dec 11, 2011 at 10:22 UTC | |
by wfsp (Abbot) on Dec 11, 2011 at 13:34 UTC | |
by jethro (Monsignor) on Dec 11, 2011 at 14:45 UTC | |
by jethro (Monsignor) on Dec 11, 2011 at 14:21 UTC |