in reply to Indexing two large text files
Also note that you can prepare the hash so that it contains, as the “value” that is associated with the key, the position of the start of the record in the file and maybe also the length of that record. Thus, even if the file itself is “large” (and by modern standards, 350M really isn’t), you are only keeping track of the keys and the location of the rest of it ... which you can retrieve as needed.
If you intend to do this frequently, consider also using, say, an SQLite database (file...) for all or part of the task, which gives you (for example) the easy option of indexing the data in several different ways at the same time, all of them ultimately using the “key + starting location + size of record” strategy. A single sequential pass through the file is enough to (re-)load these indexes, and now you can use SQL JOINs to start asking questions without doing any further “programming, per se.“ The file being indexed, and the records therein, could be huge, and the index-databases would remain small and of course, persistent.
Naturally, it depends entirely on what you intend to do ... on what part of the technical trade-off you intend to emphasize, vs. what price you can afford to pay to get it.
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Indexing two large text files
by BrowserUk (Patriarch) on Apr 09, 2012 at 18:50 UTC | |
by locked_user sundialsvc4 (Abbot) on Apr 09, 2012 at 22:02 UTC | |
by BrowserUk (Patriarch) on Apr 09, 2012 at 22:14 UTC | |
by aaron_baugher (Curate) on Apr 10, 2012 at 15:15 UTC | |
by BrowserUk (Patriarch) on Apr 10, 2012 at 16:11 UTC | |
| |
by never_more (Initiate) on Apr 10, 2012 at 11:57 UTC |