Interesting. Here's the results of applying some of this to the real code. My real file is about 200MB.
The full code also has a few more things; it gets a list of filehandles to chain together, and a maximum length to read in total; this complicates things slightly but doesn't seem to affect performance above noise levels.
## Original code: 98.50user 0.18system 1:38.68elapsed 100%CPU ## With ikegami's change: 85.84user 0.24system 1:26.15elapsed 99%CPU ## Using 8k blocksize for reading 24.62user 0.25system 0:24.87elapsed 100%CPU ## using 32k blocksize 26.74user 0.37system 0:27.16elapsed 99%CPU
Going higher than 8K blocks doesn't seem to help much on my system; the extra time taken with 32K blocks is probably noise and would smoothen out over multiple runs.
Since the actual input can be a stream (in this case usually the output of some other program piped to mine), I couldn't move the "multiple of 4 bytes" check out of the loop; but since it's no longer in the tightest loop that doesn't appear to matter much either.
And for completeness, here's my time using your C code converted to xs:
0.38user 0.40system 0:00.95elapsed 82%CPU
One interesting thing at this point is that we've dropped CPU usage from 100% down to about 80%, indicating that at this point reading the file might be the bottleneck.
Of course, that's all relative; after going from 1:38 or 0:00.95 any remaining optimization isn't of any practical purpose for me.
Thanks to both you and Ikegami for showing me I discarded the C-in-Perl too soon. While the end result is a bit more annoying to use, the performance gains are certainly worth it.
In reply to Re^2: Improving performance of checksum calculation
by Crackers2
in thread Improving performance of checksum calculation
by Crackers2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |