in reply to Speed of Split

If each data point is in a seperate file why join them only to split them again ? if your files are nicely arranged so that for each time interval you have eight data files then just read these 8 directly into an AoA. Can you give some examples of what the pre-joined source files are like ? You stand to gain on not shelling out of perl to join and not splitting. If the files are fixed record length there may be even more optimisation possible.

Cheers,
R.

Replies are listed 'Best First'.
Re^2: Speed of Split
by Lexicon (Chaplain) on Nov 18, 2004 at 09:47 UTC
    A fine question. I'm uncertain what assumptions I can make about the data files, as I don't control the code which generates them. Each individual data file looks like:
    # time data 0.000000 99.537 1.000000 100.273 2.000000 98.169 3.000000 105.835 4.000000 93.013 5.000000 96.145 6.000000 87.040 7.000000 97.764 8.000000 97.811
    I have to join the data files based on the time point. I can probably assume that the time points will be ordered and the same in each file, and also a fixed column width that, worst case, I can calculate per file. I cannot assume the timepoints will always be integers. I was being conservative when I wrote this, but now it seems to be my bottleneck. I am sending an email to the other developer asking what guarantees we can work out about it.
Re^2: Speed of Split
by Lexicon (Chaplain) on Nov 20, 2004 at 14:05 UTC
    Making some assumptions and writing some 20 lines of custom import code has made the whole program roughly 3x faster (about 5 minutes per set of data files on a 900mhz athlon).