in reply to Re^3: Can you improve upon my algorithm.
in thread Can you improve upon my algorithm.
In theory!!!!
In practice!
My E: drive is brand new, barely used and connected to a SATA-3 i/f.
wc -c does almost nothing but increment a variable, and it reads a 34GB file in 192 secs, at a rate of 171MB/s:
E:\>dir 30GB.dat 28/02/2015 08:21 34,560,000,000 30GB.dat [ 0:08:11.82] E:\>wc -c 30GB.dat 34560000000 30GB.dat [ 0:11:23.17] E:\>perl -E"say 34560000000 / 192 / 1024**2;" 171.661376953125
I created 50 1GB files:
E:\md junk E:\cd Junk E:\junk>perl -le"open O, '>00';printf O qq[%01023d\n],$_ for 1..1024** +2; close O; system qq[copy 00 $_] for('01'..'50')" 13/03/2015 00:23 1,074,790,400 01 13/03/2015 00:23 1,074,790,400 02 13/03/2015 00:23 1,074,790,400 03 13/03/2015 00:23 1,074,790,400 04 13/03/2015 00:23 1,074,790,400 05 13/03/2015 00:23 1,074,790,400 06 13/03/2015 00:23 1,074,790,400 07 13/03/2015 00:23 1,074,790,400 08 13/03/2015 00:23 1,074,790,400 09 13/03/2015 00:23 1,074,790,400 10 13/03/2015 00:23 1,074,790,400 11 13/03/2015 00:23 1,074,790,400 12 13/03/2015 00:23 1,074,790,400 13 13/03/2015 00:23 1,074,790,400 14 13/03/2015 00:23 1,074,790,400 15 13/03/2015 00:23 1,074,790,400 16 13/03/2015 00:23 1,074,790,400 17 13/03/2015 00:23 1,074,790,400 18 13/03/2015 00:23 1,074,790,400 19 13/03/2015 00:23 1,074,790,400 20 13/03/2015 00:23 1,074,790,400 21 13/03/2015 00:23 1,074,790,400 22 13/03/2015 00:23 1,074,790,400 23 13/03/2015 00:23 1,074,790,400 24 13/03/2015 00:23 1,074,790,400 25 13/03/2015 00:23 1,074,790,400 26 13/03/2015 00:23 1,074,790,400 27 13/03/2015 00:23 1,074,790,400 28 13/03/2015 00:23 1,074,790,400 29 13/03/2015 00:23 1,074,790,400 30 13/03/2015 00:23 1,074,790,400 31 13/03/2015 00:23 1,074,790,400 32 13/03/2015 00:23 1,074,790,400 33 13/03/2015 00:23 1,074,790,400 34 13/03/2015 00:23 1,074,790,400 35 13/03/2015 00:23 1,074,790,400 36 13/03/2015 00:23 1,074,790,400 37 13/03/2015 00:23 1,074,790,400 38 13/03/2015 00:23 1,074,790,400 39 13/03/2015 00:23 1,074,790,400 40 13/03/2015 00:23 1,074,790,400 41 13/03/2015 00:23 1,074,790,400 42 13/03/2015 00:23 1,074,790,400 43 13/03/2015 00:23 1,074,790,400 44 13/03/2015 00:23 1,074,790,400 45 13/03/2015 00:23 1,074,790,400 46 13/03/2015 00:23 1,074,790,400 47 13/03/2015 00:23 1,074,790,400 48 13/03/2015 00:23 1,074,790,400 49 13/03/2015 00:23 1,074,790,400 50 50 File(s) 53,739,520,000 bytes 2 Dir(s) 1,490,904,588,288 bytes free
I then ran the following script which:
Total runtime (projection): 85 hours; average read rate: 10MB/minute or 171kb/second.
The script:
Note: there is no heap or merge or insertion sort of the records being 'merged', indeed no comparisons whatsoever; and no writing to disk.
Just the generation of a random number, the copying of 16 bytes, and the adjustment of a pointer (CUR) for each record.
But it happens 3.3 billion times.
With the result that there is almost no IO activity for 98% of the time, and then occasional bursts as 10MB buffers are repopulated.
As the nature of random is to distribute evenly, the 50 x 10MB reads tend to come grouped pretty close together; roughly every 50 minutes.
With 102 buffers per 1GB file, thats 102 * 50 = 5100 minutes or 85 hours or 3 1/2 days; which mirrors the timings I've experienced using external sort programs.
I'm not going to let it run to completion. I hope you'll understand why.
Update:At the point of posting this, the process had just been running for exactly 2 hours and has read 890.4MB.
That's 7.42MB/minute which projects to 4.8 days to complete the 50GB.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Can you improve upon my algorithm.
by salva (Canon) on Mar 13, 2015 at 10:06 UTC | |
by BrowserUk (Patriarch) on Mar 13, 2015 at 11:14 UTC | |
by salva (Canon) on Mar 13, 2015 at 11:31 UTC | |
by BrowserUk (Patriarch) on Mar 13, 2015 at 11:43 UTC | |
by salva (Canon) on Mar 13, 2015 at 23:28 UTC | |
|