in reply to best way to fast write to large number of files

I do not know enough about your requirements to figure out if my suggested ideas make sense. The question is: do you need your client files to be updated every minute or even every 10 minutes or even every hour? Probably not, I would suspect you need to process the log files quite often, but not necessary update your client files so often.

Based on these assumptions, I can think of two general types of solution.

One is to read the log files and store the daily activity into a database and to download into the client files the database content once per day (or pick up any other time interval better suiting your needs). The advantage is that the overhead of opening so many files occurs only once per day.

Another idea is to pseudo-hash your client logs into temporary files. For example, you could store into a file all logs concerning client whose customer number ends with 00. In another file logs pertaining to clients whose customer number ends with 01. And so on until 99. So that each time you read a log, and assuming you sorted the log by the last two digits of your customer number, you only need to to open for write only 100 files, which will mean much less overhead than 18K files. Then, again, once per day (or whatever better schedule fits your needs better), you process these temporary files to put the records into the final client files. I am fairly sure that using such a mechanism would give you a huge gain.

Of course, the idea of using 100 temporary files per day and process them once per day are just random numbers that I picked up because they made some sense to me. You may want to change both numbers so something else if it makes more sense to your case. It could be once per hour, and it could be more temporary files or less temporary files. You have to figure out the best combination based on your knowledge of the situation and actual tests on the data.

  • Comment on Re: best way to fast write to large number of files

Replies are listed 'Best First'.
Re^2: best way to fast write to large number of files
by Hosen1989 (Scribe) on Jun 23, 2014 at 21:44 UTC

    Dear Lauren_R

    Thanks for your replay, Its has very interesting ideas (I read your replay more than 5 times ^_^). And yes,that is what you suggested was true.

    what's in my mind now (go with first idea) is to load the logs to DB (we will use mySQL - thanks sundialsvc4 for the idea about using DB), and then we will run some groupby query then write the result to the specific files, this will reduce the number of opening and closing.

    I think with the right schedule we can handle all the files without any delay.

    and we will try the second idea too, because it has also good approach to resolve the issue.

    we will compare the two idea and off-curse chose the best ;)

    I will update shortly.

    BR

    Hosen

      Hi Hosen, I suspect that the second solution will be significantly faster because your overall process (write once, read once each record) only marginally builds on the advantages of a data base, while appending data to 100 files is very fast. But I'll be very interested to read your update on this.