In theory!!!!
In practice!
My E: drive is brand new, barely used and connected to a SATA-3 i/f.
wc -c does almost nothing but increment a variable, and it reads a 34GB file in 192 secs, at a rate of 171MB/s:
E:\>dir 30GB.dat 28/02/2015 08:21 34,560,000,000 30GB.dat [ 0:08:11.82] E:\>wc -c 30GB.dat 34560000000 30GB.dat [ 0:11:23.17] E:\>perl -E"say 34560000000 / 192 / 1024**2;" 171.661376953125
I created 50 1GB files:
I then ran the following script which:
Total runtime (projection): 85 hours; average read rate: 10MB/minute or 171kb/second.
The script:
Note: there is no heap or merge or insertion sort of the records being 'merged', indeed no comparisons whatsoever; and no writing to disk.
Just the generation of a random number, the copying of 16 bytes, and the adjustment of a pointer (CUR) for each record.
But it happens 3.3 billion times.
With the result that there is almost no IO activity for 98% of the time, and then occasional bursts as 10MB buffers are repopulated.
As the nature of random is to distribute evenly, the 50 x 10MB reads tend to come grouped pretty close together; roughly every 50 minutes.
With 102 buffers per 1GB file, thats 102 * 50 = 5100 minutes or 85 hours or 3 1/2 days; which mirrors the timings I've experienced using external sort programs.
I'm not going to let it run to completion. I hope you'll understand why.
Update:At the point of posting this, the process had just been running for exactly 2 hours and has read 890.4MB.
That's 7.42MB/minute which projects to 4.8 days to complete the 50GB.
In reply to Re^4: Can you improve upon my algorithm.
by BrowserUk
in thread Can you improve upon my algorithm.
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |