in reply to Quickest way to write multiple files

I would encourage you to question your motives in this task. Why do you think you need to have just 12 records per output file? And what's wrong, really, with having just a single file with 5000+ records in it?

If you're bothered now by the extra time it takes to create 400+ new files, there's a good chance you'll be bothered again whenever you need to do a global scan of those 400+ files later on (e.g. to search for some particular string).

In your later reply, you say that reading/parsing the one big file seems to take very little time, and most of the time during your split routine is spent handling the open/write/close on all the little files. This is a normal outcome, which you will also observe when reading data back from all those little files. So, what benefit do you get from the little files that will offset the price you pay in extra run time?

If you're trying to improve access time to any chosen record in the set, by reducing the size of the file that must be read to fetch that record, there are better ways to do this, that do not involve writing tons of little files.

For instance, create an index table of byte offsets for each record within the one big file; if each record is uniquely identified by some sort of "id" field in the xml structure, store that with the record's byte offset. Then to read a record back, just use the "seek()" function to go directly to that record, read to the end of that record, and parse it. That's a simple technique, and it would be hard to come up with a faster access method than that.

  • Comment on Re: Quickest way to write multiple files