in reply to optimization in file processing
Reading terabytes of file data is taking time. I tried to read a 2 Gbyte file with the simple perl script
#!/usr/bin/perl while (<>) { }
and it took 39 seconds. Translated to 1 Terabyte this script would need 5 hours.
What part of that is perl and what is the hard disk? When I used
cat twogig.txt > /dev/null
it still took 25 seconds. Translated to 1 Terabyte that is 3.3 hours. So in my case 2/3 of the time is spent just by reading from disk, the rest can be contributed to not reading large chunks, i.e. the overhead of reading line by line.
Do these tests yourself and you will get the lower limit of what you can hope to achieve without either throwing faster hardware at it or preprocessing the data (if the file doesn't change all the time you might construct the hash on disk and use it more than once)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: optimization in file processing
by moritz (Cardinal) on Jul 08, 2011 at 10:18 UTC | |
by jethro (Monsignor) on Jul 08, 2011 at 11:57 UTC |