in reply to Re^2: Perl Hashes in C?
in thread Perl Hashes in C?

This was the code example from FreeBeerReekingMonk: ----------------------------- my %SEEN; my $key; open( FH, '<', "image.raw" ) or die $!; binmode(FH); my $startpos = 10; # skip header seek(FH, $startpos, 0); $/ = \42; # set record size while( <FH> ) { $key = join("", unpack 'H42', $_ ); $SEEN{$key}++; # not sure how to calculate stop } for my $key (keys %SEEN){ print "key $key seen $SEEN{$key} times\n"; } # Add method to go from hex key to numeric rgb values if you want ================================= I have no idea where the \42 came from. I changed it to a record size=6. Then changed the Unpack from 'H42' to 'H12' 12 hex = 6 bytes. It works: # The RIGHT ANSWER IS: RGB hash has 27645898 distinct colors. sub rgb_hash2() { my %SEEN; my $key; # 27645898 colors 216913920 B 36152320 pixels! $raw="pf-2015.0531-249465.pprgb.7360x4912.raw"; # $fs=-s $raw; # RAW file size in BYTES open( FH, '<', "$raw" ) or die $!; binmode(FH); # RGB 16 BITS/CHANNEL = 48BITs/PIXEL = 6 BYTE QUANTA -> 12 HEX digits $/ = \6; # set record size. REF 2 INT, "\6" == 6 B #$/ = \42; # set record size while( <FH> ) { # Smash each 12 Hex rec into 48 BIT UINT48? #$key = join("", unpack 'S3', $_ ); $key = join("", unpack 'H12', $_ ); $SEEN{$key}++; } close FH; $sk=scalar keys %SEEN; # Scalar Keys. How many COLORS? printf("Seen{%d}, file_size=%2.6fMB, -> %1.6f B/key\n", $sk, $fs/1.0E6, $fs/$sk); return $sk; } I also tried it with 3 Unsigned Shorts too. ===================================================== ===================================================== Monk Ed suggested something at a bit lower level with a sysread into a tiny 4096 * 6 byte buffer and nibbling the buffer substr. If you monkey with the buffer size and try a size not evenly divisible by 6, it breaks. D'oh! =============================================== Ed's ~ code (may be partially mangled by me). %Image = (); keys %Image = 4096 * 128; $rdamt = 4194304 unless $rdamt; # 4 MByte! 4096 * 6; $compraw = -s $file; $pixels = int ( $compraw / 6 ); open ( $in, "<", "$file") or die "$!\n"; while( 1 ) { $size = sysread( $in, $buffer, $rdamt ); if ( $size == 0 ) { last; } while( $buffer ) { if ( length( $buffer ) < 6 ) { last; } $key = substr( $buffer, 0, 6, '' ); $Image{$key}++; } } close $in; $uniq = keys %Image; =============================================== PERFORMANCE SYSREAD with buffer size Event 'Read_Raw_16384' elapsed time = 0.520 min = 10.43%. Event 'Read_Raw_24576' elapsed time = 0.519 min = 10.41%. Event 'Read_Raw_4096' elapsed time = 0.525 min = 10.54%. 10.54% -> Read_Raw_4096 10.43% -> Read_Raw_16384 10.41% -> Read_Raw_24576 << Ed's Magic Number 4096 * 6 Average ~= 0.5 min Record read, unpack to HEX Event 'Read_Raw_Hex0' elapsed time = 0.781 min = 15.68%. Event 'Read_Raw_Hex1' elapsed time = 0.711 min = 14.26%. Event 'Read_Raw_Hex2' elapsed time = 0.704 min = 14.13%. Event 'Read_Raw_Hex3' elapsed time = 0.707 min = 14.18%. 15.68% -> Read_Raw_Hex0 14.26% -> Read_Raw_Hex1 14.18% -> Read_Raw_Hex3 14.13% -> Read_Raw_Hex2 Average ~= 0.7 min Ed's magic number beat out the others tried by an eyelash! Tried Unpacking 3 UShorts also. Seems like it should be faster to unpack 3, 2 byte ints than 12 hex: Event 'Read_Raw_UINT16x3Pack0' elapsed time=0.959 min=25.82% Event 'Read_Raw_UINT16x3Pack1' elapsed time=0.919 min=24.76% Event 'Read_Raw_UINT16x3Pack2' elapsed time=0.918 min=24.73% Event 'Read_Raw_UINT16x3Pack3' elapsed time=0.917 min=24.70% 25.82% -> Read_Raw_UINT16x3Pack0 24.76% -> Read_Raw_UINT16x3Pack1 24.73% -> Read_Raw_UINT16x3Pack2 24.70% -> Read_Raw_UINT16x3Pack3 NO! Hex Unpack is faster that UINT16 on the same data and by a sizable margin, The low level sysreads without buffering Gets the GOLD MEDAL! There was another INLINE C method from BrowserUk that looked perfect, but it used GD with "TRUECOLOR" which on my system means 24 bit and my data are all 16 so I didn't try it. What started as a 2 hour problem has been whittled down to 20 seconds. Ye Ha! Thanks for all the great ideas! Brian ======================================================== Inline C version from BrowserUk: by BrowserUk on Aug 11, 2015 at 20:28 UTC This drops into Inline::C to explore the bitmap and construct the hash +. It takes 23 seconds to process a 125 megapixel image: #! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => '_1138218', CLEAN_AFTER_BUILD =>0; typedef unsigned __int64 U64; typedef unsigned int U32; typedef unsigned char U8; HV* countColors( SV *img ) { STRLEN l; U32 *rgba = (U32*)SvPVx( img, l ), i; HV* counts = newHV(); l /= 4; for( i = 0; i < l; ++i ) { if( hv_exists( counts, (char*)&rgba[ i ], 4 ) ) { SV **val = hv_fetch( counts, (char*)&rgba[ i ], 4, 0 ); SvIV_set( *val, SvIV( *val ) + 1 ); } else { SV *val = newSViv( 1 ); hv_store( counts, (char*)&rgba[ i ], 4, val, 0 ); } } return counts; } END_C use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; use List::Util qw[ sum ]; use GD; GD::Image->trueColor(1); my $i = GD::Image->new( $ARGV[ 0 ] ) or die $!; my $gd = $i->gd; my( undef, $width, $height, undef, undef ) = unpack 'nnnCV', substr( $ +gd, 0, 11, '' ); printf "Width:%u height:%u pixels:%u\n", $width, $height, length( $gd +) / 4; my $start = time; my $href = countColors( $gd ); printf "Took %f seconds\n", time() -$start; printf "Found %u colors in %u pixels\n", scalar( keys %{ $href } ), su +m( values %{ $href } );