in reply to Searching and sorting a large DBF file

First of all, do you have an index for your table? If not, that might explain why looking up a single record takes a lot of time.

But anyway, a mere 25000 records is simple to hold into memory in Perl all at once. So my advice is: select all data from the table, and read it all into a Perl data structure, for example, HoH, where the keys are your key field, and the values are the records, as hashes. That will not take more than a few seconds. Next, go through this structure one at a time, and insert the data into the output database.

I've noticed that outputting all data to a text file, like CSV, may be quite a bit swifter, which you can then import into your dB using an import tool.

  • Comment on Re: Searching and sorting a large DBF file

Replies are listed 'Best First'.
Re: Searching and sorting a large DBF file
by tyndyll (Novice) on May 13, 2004 at 18:55 UTC

    An index? I'm not sure what you mean. One thing I have thougth of is as well as doing a count of the sort codes building a hash structure with the row number and the sort code - is this what you mean? Out of curiosity what is "too large" a hash? (coming from a web background where i'm used to trying to make everything as small as possible)

    Thanks for your quick replay too

    tyndyll

      An index is a database object used for doing fast lookups on particular pieces of data. i.e.

      select column1,colum2 from table1 where column1 = 'bob'

      would be slow if there were a million records. But, if there was a structure that was faster to do a lookup on, an index, existed for this table, lookups would be faster. Indexes are seperate from tables as they are optimized for specific pieces of data. To use an index that you create, look up your db manual, just use it in the where clause somehow.

      select col1,col2 from table1,table2 where table1.col1='bob' and table2.col2=table1.col2
      This would join fast if there was an index on col2, and would also be fast on finding 'bob' if there was an index on col1.

      Further more, indexes don't work well when you transform data. Doing ...

      select * from table where col1*2 = 50
      would never use an index, as you are comparing 50 to mutated data. If you did co1=50, it would be fine. There's an index on the values of the table, not the values that you mutate like above.