in reply to Better way to work with large TSV files?
I would write the filename and line number to a log file after a block of lines. 1000 lines is probably a good size. The log file becomes a nice way to measure the progress, and to record what has been done. When there is a failure, find the last line for each file in the log.
Make transactions larger is a big performance boost for bulk inserts. I would make the group for a transaction the same size as the group for writing line numbers. For each group, I would write the log entry and commit.
|
|---|