in reply to Re^2: Best way to match a hash with large CSV file
in thread Best way to match a hash with large CSV file

1. Build an SQLite DB from the file of 120,000 records.
No problem at all. This will take <1 second for a file with 120K lines.

1.5) Index the DB. This step is not to be under estimated and was not mentioned.

2. Perform 5000 SQL queries -- ...Ooooh...I did not say what kind of queries nor how many...But essentially, yes you can do a lot of these things per second, provided that you have indexed the DB in step 1.5 correctly. Some seconds will be required for this step.

--Steps 3,4,5 are super trivial reformatting steps...

3. Convert all the results records, from each of the 5000 queries, back to CSV records.

4. Write them out to the new file.

5. (Delete the SQLite DB you created!)

Update:

Creation of a DB with 266,551 lines from >600+ source files took 78 seconds.
Fancy indexing took another 20 seconds.
Basically 100 seconds and I'm "ready to roll" with 1/4 million lines.

I have a whimp "Prescott" machine. This is more than a decade old.

  • Comment on Re^3: Best way to match a hash with large CSV file

Replies are listed 'Best First'.
Re^4: Best way to match a hash with large CSV file
by BrowserUk (Patriarch) on Nov 06, 2011 at 12:36 UTC

    Seeing's believin'.

    Also, indexing a DB is a part of building it.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Ok, let's go! Give me a FTP drop site and I will give you a .zip file with data and code!

      I just ran a easy test for me with 818 files, 266,551 DB "rows", < 100 seconds to read all files, created DB, index DB in more ways from Sunday.

        < 100 seconds

        ... isn't less than 6 seconds.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.