in reply to Re^2: Best way to match a hash with large CSV file
in thread Best way to match a hash with large CSV file
1.5) Index the DB. This step is not to be under estimated and was not mentioned.
2. Perform 5000 SQL queries -- ...Ooooh...I did not say what kind of queries nor how many...But essentially, yes you can do a lot of these things per second, provided that you have indexed the DB in step 1.5 correctly. Some seconds will be required for this step.
--Steps 3,4,5 are super trivial reformatting steps...
3. Convert all the results records, from each of the 5000 queries, back to CSV records.
4. Write them out to the new file.
5. (Delete the SQLite DB you created!)
Update:
Creation of a DB with 266,551 lines from >600+ source files took 78 seconds.
Fancy indexing took another 20 seconds.
Basically 100 seconds and I'm "ready to roll" with 1/4 million lines.
I have a whimp "Prescott" machine. This is more than a decade old.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Best way to match a hash with large CSV file
by BrowserUk (Patriarch) on Nov 06, 2011 at 12:36 UTC | |
by Marshall (Canon) on Nov 06, 2011 at 13:27 UTC | |
by BrowserUk (Patriarch) on Nov 06, 2011 at 13:42 UTC | |
by Marshall (Canon) on Nov 06, 2011 at 13:58 UTC |