If I understand correctly you have a large text which you break into 100mbyte chunks and then calculate the checksum on each chunk independently, i.e. you are not trying to create 1 checksum for the whole 127gbyte text.
You can save some time by setting up a pipeline: it starts by reading 1 chunk. When it is done, the checksum calculation begins in another thread and at the same time current thread reads the 2nd chunk in parallel.The savings depend on the ratio of the time reading from disk over the time calculating the checksum. The big objection here is that shared memory between threads in Perl is not efficient (perhaps someone can teach me otherwise) and you end up wasting more time in locking or in duplicating data between the threads. The alternative is the reader thread to pass data via a pipe to the calculating thread.
Also, it is worth investigating Digest::SHA's ability to add data from a stream as it becomes available over the pipe and whether some calculations can be done before the full chunk becomes available. I do not know about this.
If the problem is indeed IO-bound then compressing the files to, say, half the size will reduce IO time and increase calculation time (decompressing+caclulating). If you can make that ratio 50/50 then you have a nice candidate for a pipeline to halve your overall time (and increase power consumption). You should consider this only if you intend to repeat this experiments in the future (see paragraph below) otherwise you will end up with both longer time and bigger electricity bill.
Lastly, if you are thinking doing similar experiments using same data, perhaps one can split the file (unix split --bytes=100000000 file.dat) in advance and move it into different physical disks permanently. The cost of split+move can be worth if you intend to do these (or similar) calculations/experiments repeatedly. Suppose you move them to 3 different disks, then you can parallelise the process over 3 threads and benchmark what your OS and hardware get you on the theoretical 2/3 savings. With this setup you can save time over every experiment you make in the future but I doubt anyone has 3 physically distinct disks on a laptop.
bw, bliako
In reply to Re: SHA-256? What do you all think of this?
by bliako
in thread SHA-256? What do you all think of this?
by locked_user erichansen1836
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |