Re^3: Fast parsing for cypher block chaining

the 8 byte buffer would result in a lot of disk thrashing compared to 64k buffer

And there shouldn't be any disk trashing because read and write are buffered. The size of the buffer isn't 8 bytes as you claim, but rather a multiple of the size of a disc sector. The disk is not accessed every time read and write is called.

Because I thought read would be slower than sysread

sysread + syswrite + manual buffering in Perl
should be slower than
read + write + well tested buffering in C

Note that I didn't say my way is faster, just that it might be. Benchmark to find out which is faster on your system.

Comment on Re^3: Fast parsing for cypher block chaining Select or Download Code

Replies are listed 'Best First'.
Re^4: Fast parsing for cypher block chaining by fluffyvoidwarrior (Monk) on Mar 01, 2006 at 07:30 UTC
Just benchmarked a comparison between sysread/write with substr() and straight perl IO read/write if anyone is still interested. The results are a bit of a shock! Parsing a 700Mb file in 8 byte chunks took 81 seconds with the sysread method. Using perl buffered read/write it took 522 seconds. It seems sysread and handling your own buffering can produce performance gains of upto 700% - which is what I'm looking for. Heres my code just in case I've done anything dumb (I'm assuming using OOP IO is OK) `$infile = new IO::File; $outfile = new IO::File; $infile->open($input_filepath); $outfile->open(">$output_filepath"); for($chunk_counter = 0; $chunk_counter < $infile_num_chunks +1; + $chunk_counter = $chunk_counter + 1){ $infile->read($buffer, 8); $outfile->write($buffer,8); }` [download] I originally used a while construct for loop control but then thought maybe it was slowing things down. It was. Using "while" took 522 secs. Using the counter as above took 449 secs. Either way sysread and substr() is loads faster.	[reply] [d/l]
Re^5: Fast parsing for cypher block chaining by ikegami (Patriarch) on Mar 01, 2006 at 15:18 UTC
If it done right, it should be faster than this buggy code you show here. I say buggy since you're assuming the input file is a multiple of 8 bytes. Using IO::File is probably much slower than using `read` and `write`. Objects are slower than their non-object equivalent. If you want to use `sysread` and `syswrite`, you can use them with the solution in Re: Fast parsing for cypher block chaining, which is what you should be using.	[reply] [d/l] [select]
Re^6: Fast parsing for cypher block chaining by fluffyvoidwarrior (Monk) on Mar 01, 2006 at 17:20 UTC
Yes, Thanks. I'll compare this with using Crypt::CBC as you suggested earlier. As for the "bug" you spotted - my intention was to parse the last 64k buffer outside this loop which removes the need for huge numbers of loop exit condition tests ("while" actually does slow it down), replacing such tests with a simple counter in all except the last bufferfull. I suppose I should have posted more code to give a better feel of things but I didn't want to dump too much on people when I thought I'd isolated the crux of the speed issue. That being how to optimise splitting a 64k string into 8 byte chunks. Using Crypt::CBC as you suggested may mean that I don't need to do this anyway. Again, thanks for your help. I'll experiment further and update this post if your still interested .....	[reply]
Re^7: Fast parsing for cypher block chaining by ikegami (Patriarch) on Mar 01, 2006 at 18:21 UTC
Re^5: Fast parsing for cypher block chaining by Anonymous Monk on Mar 01, 2006 at 09:05 UTC
That is not benchmark code	[reply]
Re^6: Fast parsing for cypher block chaining by fluffyvoidwarrior (Monk) on Mar 01, 2006 at 10:25 UTC
Its a snippet of a subroutine timed using the Benchmark module ie use Benchmark; my $interval = timeit(1, \&wibble); where wibble is my subroutine. I'm also MD5 hashing the input and output files before and after my subroutine is called to make sure I'm not outputting garbage and also checking filesize. Anyway, just by watching a clock I can tell the difference between nearly 10 mins execution and 1.5 mins. Can anyone rewrite the above code to increase performance by a factor of 5 or more ? Obviously if the two approches were in the same ballpark I wouldn't bother with the sysread stuff - I'm not trying to complicate things on purpose.	[reply]