in reply to Re^3: Algorithm advice sought for seaching through GB's of text (email) files
in thread Algorithm advice sought for seaching through GB's of text (email) files
You're worried about using more than 2GB of hard drive space?
Of course not. How did you arrive at that question?
The point is, to process the data, it has to be read from disc regardless of whether it is read directly or through a DB. But to process it through the DB, it has first to be read (from the flat files), then written (to the DB files and indexes), then re-read (from/via the DB and indexes).
Sure, if the data is structured and can be indexed in a manner that aids the query, then the final re-read may entail reading less data than the original read--but it is still duplicated or triplicated effort unless there is a known future benefit from having it stored.
And, in an IO-bound process, all that extra IO does nothing to facilitate performance improvements through the use of parallelization. One of tilly's cited benefits.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Algorithm advice sought for seaching through GB's of text (email) files
by perrin (Chancellor) on Sep 24, 2006 at 23:16 UTC |