Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Improving performance of checksum calculation

by Crackers2 (Parson)
on May 30, 2009 at 03:31 UTC ( [id://767001]=perlquestion: print w/replies, xml ) Need Help??

Crackers2 has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code to calculate a checksum (the checksum algoritm is not under my control):

open my $fh, "<", "myfile" or die; my $check_value = 0; while (1) { my $buf; my $cnt = read $fh, $buf, 4; if (!defined $cnt) { die("Error while reading from input file: $!"); } elsif ($cnt==0) { last; } elsif ($cnt%4) { die("File or chunk does not seem to have a length that's a multi +ple of 4 bytes"); } my $val = unpack("L",$buf); $check_value ^= $val; my $rotate = $check_value & 0x80000000; $check_value = ($check_value << 1 ) & 0xffffffff; if ($rotate) { $check_value |= 0x01; } }

It works but is terribly slow. Any suggestions to speed it up?. This looks like the kind of thing that would be a lot faster in C, and trivially so in assembler because of the nice rotate instructions; I'd rather avoid shelling out to a subprocess just for this though.

Replies are listed 'Best First'.
Re: Improving performance of checksum calculation
by BrowserUk (Patriarch) on May 30, 2009 at 06:47 UTC

    If you can slurp the file, you can reduce the time by 75%. See the comments after the end block show various steps to arriving at this:

    #! perl -sw use 5.010; use strict; use Time::HiRes qw[ time ]; my $start = time; my $size = -s $ARGV[ 0 ]; die("File not a multiple of 4 bytes") unless ( $size % 4 ) == 0; open my $fh, "<:raw", $ARGV[ 0 ] or die; my $data; { local $/; $data = <$fh>; } close $fh; open $fh, '<', \$data; my $check_value = 0; my $buf; while( read( $fh, $buf, 4 ) ) { $check_value ^= unpack 'L', $buf; $check_value = ( ( $check_value & 0x7fffffff ) << 1 ) | ( $check_value >> 31 ); } say $check_value; printf "Took: %f seconds\n", time() -$start; __END__ ## Original code C:\test>767001.pl 767001-small.dat 2779316821 Took: 13.011000 seconds ## Eliminate in loop conditions; C:\test>767001.pl 767001-small.dat 2779316821 Took: 11.577000 seconds ## Use Ikegami's re-write C:\test>767001.pl 767001-small.dat 2779316821 Took: 10.453000 seconds ## Use RAM-file C:\test>767001.pl 767001-small.dat 2779316821 Took: 3.148000 seconds

    However, then I tried a different tack--reading and processing the file in larger chunks--and halved that again while eliminating the slurp limitation. Again see the comments, but 64K chunks seems optimal on my system:

    #! perl -sw use 5.010; use strict; use Time::HiRes qw[ time ]; use constant BUFSIZ => 64 * 1024; my $start = time; my $size = -s $ARGV[ 0 ]; die("File not a multiple of 4 bytes") unless ( $size % 4 ) == 0; open my $fh, "<:raw", $ARGV[ 0 ] or die; my $check_value = 0; my $buf; while( read( $fh, $buf, BUFSIZ ) ) { for ( unpack 'L*', $buf ) { $check_value ^= $_; $check_value = ( ( $check_value & 0x7fffffff ) << 1 ) | ( $check_value >> 31 ); } } say $check_value; printf "Took: %f seconds\n", time() -$start; __END__ ## Process 4K chunks C:\test>767001-buk 767001-small.dat 2779316821 Took: 1.771000 seconds ## Process 16K chunks C:\test>767001-buk 767001-small.dat 2779316821 Took: 1.750000 seconds ## Process 64K chunks C:\test>767001-buk 767001-small.dat 2779316821 Took: 1.775000 seconds ... ## Process 256K chunks C:\test>767001-buk 767001-small.dat 2779316821 Took: 1.804000 seconds

    But for ultimate speed, combine the above technique with some Inline::C and you can reduce the time to 1/1000th of your original:

    #! perl -sw use 5.010; use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => '_767001', CLEAN_AFTER_BUILD => 0; U32 checksum( U32 sum, SV *buf ) { int i; int n = SvCUR( buf ) >> 2; U32 *p = (U32 *)SvPVX( buf ); for( i = 0; i < n; ++i ) { sum ^= p[ i ]; sum = ( ( sum & 0x7fffffff ) << 1 ) | ( sum >> 31 ); } return sum; } END_C use Time::HiRes qw[ time ]; use constant BUFSIZ => 64 * 1024; my $start = time; my $size = -s $ARGV[ 0 ]; die("File not a multiple of 4 bytes") unless ( $size % 4 ) == 0; open my $fh, "<:raw", $ARGV[ 0 ] or die; my $sum = 0; my $buf; while( read( $fh, $buf, BUFSIZ ) ) { $sum = checksum( $sum, $buf ); } say $sum; printf "Took: %f seconds\n", time() -$start; __END__ C:\test>767001-IC 767001-small.dat 2779316821 Took: 0.014622 seconds

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Interesting. Here's the results of applying some of this to the real code. My real file is about 200MB.

      The full code also has a few more things; it gets a list of filehandles to chain together, and a maximum length to read in total; this complicates things slightly but doesn't seem to affect performance above noise levels.

      ## Original code: 98.50user 0.18system 1:38.68elapsed 100%CPU ## With ikegami's change: 85.84user 0.24system 1:26.15elapsed 99%CPU ## Using 8k blocksize for reading 24.62user 0.25system 0:24.87elapsed 100%CPU ## using 32k blocksize 26.74user 0.37system 0:27.16elapsed 99%CPU

      Going higher than 8K blocks doesn't seem to help much on my system; the extra time taken with 32K blocks is probably noise and would smoothen out over multiple runs.

      Since the actual input can be a stream (in this case usually the output of some other program piped to mine), I couldn't move the "multiple of 4 bytes" check out of the loop; but since it's no longer in the tightest loop that doesn't appear to matter much either.

      And for completeness, here's my time using your C code converted to xs:

      0.38user 0.40system 0:00.95elapsed 82%CPU

      One interesting thing at this point is that we've dropped CPU usage from 100% down to about 80%, indicating that at this point reading the file might be the bottleneck.

      Of course, that's all relative; after going from 1:38 or 0:00.95 any remaining optimization isn't of any practical purpose for me.

      Thanks to both you and Ikegami for showing me I discarded the C-in-Perl too soon. While the end result is a bit more annoying to use, the performance gains are certainly worth it.

Re: Improving performance of checksum calculation
by ikegami (Patriarch) on May 30, 2009 at 04:43 UTC
    • You should be using binmode on your input handle.

    • ($check_value & 0x7fffffff) << 1
      might be safer (more portable) than
      ($check_value << 1) & 0xffffffff

    • The following is faster (but nowhere near as fast as C would be):

      $check_value ^= unpack('L', $_); $check_value = ( ($check_value & 0x7fffffff) << 1 ) | ( $check_value >> 31 );
    • Using C doesn't require shelling out. perlxstut or Inline::C will allow you to access C code from perl.

      * You should be using binmode on your input handle.

      In the real program this is in a subroutine which gets pre-opened handles passed in; I just added a quick open at the top here to have a runnable snippet.

      * ($check_value & 0x7fffffff) << 1 might be safer (more portable) than ($check_value << 1) & 0xffffffff

      It more clearly shows what it's actually trying to do as well I think. Thanks.

      * The following is faster (but nowhere near as fast as C would be):
      $check_value ^= unpack('L', $_); $check_value = ( ($check_value & 0x7fffffff) << 1 ) | ( $check_value >> 31 );

      As shown in my reply to BrowserUK below (which I'll post in a little bit), this cut my time from 1:38 to 1:26, or about a 12% improvement.

      * Using C doesn't require shelling out. perlxstut or Inline::C will allow you to access C code from perl.

      I looked at Inline::C, but that appears to require a compiler to be available at runtime, which isn't guaranteed. Still it might be worth doing a "use it if it's there" type thing.

      So in the end I took BrowserUK's C code below and did all the magic xs invocations to get an .so; I'm including that inside an eval, so if the .so is not there it'll fall back to the slow perl implementation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://767001]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2024-04-18 09:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found