I saw.
I generated a data file based up on the information the OP gave me in response to my question. 30GB/160million records = ave. 200 bytes/record. So I used:
perl -E"printf qq[%010u,%0200u\n], $_, $_ for 1..160e6" >30GB.dat
Which makes for easy verification that the record found matches the record searched for: E:\>head -2 s:30GB.dat
0000000001,00000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000
+000001
0000000002,00000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000
+000002
E:\>tail -2 s:30GB.dat
0159999999,00000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000159
+999999
0160000000,00000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000160
+000000
But I forgot to subtract the size of the record number, delimiter and EOL from the length of the data, so my 30GB.dat is actually 32GB: E:\>dir s:30GB.*
28/02/2015 08:21 34,560,000,000 30GB.dat
28/02/2015 09:44 1,920,000,000 30GB.idx
So, whilst my data does not match his, the difference doesn't affect the indexing or the timing.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
|