If the comparision you are doing doesn't involve any complex transformations of the data structure or really time-consuming math then your CPUs will mostly sit around looking at the daisies while they are waiting for your hard disk to deliver the data. Disk I/O is slow, REALLY slow, compared to the speed of your memory or CPUs
So no matter how many CPUs you have to do the job, the only thing that probably matters in your case is how fast your disk (or disks) can read the data (and what algorithm you are using)
And if the hashes are so big that they don't fit into the RAM memory your machine starts to swap, i.e. it puts part of its memory contents back onto the hard disk which makes you even more dependent on hard disk speed. This swapping usually leads to your program doing nothing anymore except swapping, this is called 'thrashing'.
So your solution might be, depending on your circumstances:
1) Buy a faster hard disk or use a raid
2) Do some preprocessing of your data so that it takes up less space
3) Buy more RAM
4) Use a database for one of the huge files and compare the second one by accessing the database.
5) Depending on your data use some algorithm that avoids reading in the two files completely into memory, for example through a merge sort
In reply to Re^3: changing parameters in a thread
by jethro
in thread changing parameters in a thread
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |