Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: Perl Hash Performance Hits Brick Wall!

by flexvault (Monsignor)
on Aug 22, 2015 at 09:27 UTC ( [id://1139489] : note . print w/replies, xml ) Need Help??

in reply to Perl Hash Performance Hits Brick Wall!


I tried to stay out of this one, but 'what the heck!'

The type of work your doing requires a deeper understanding of Perl than you want to spend time on. In your write-up you mention passing a hash from a subroutine back to the main program, but failed to include that in your limited code description. As others have already pointed out, passing a hashref to the subroutine would have avoided having to copy a 27MM key/value hash back to the main program. As you show below it looks like it's in the main program anyway.

Tiny, active code segment ... $sr_len = sysread(IN, $buf, $bsize); # SysRead Length last if $sr_len == 0; while($buf) { $rgb=substr($buf, 0, 6, ''); # Nibble 6 bytes $rgb2c{$rgb}++; }
But why didn't you build the array while building your hash???
@rgb = keys %rgb2c; << 1 line takes 28.648 min
The above code is probably not doing what you think. '@rgb' is not in any specific order. Here's where knowing how Perl allocates an array and a hash, you could have done the following ( untested code ):
my $fsize = -s [your file]; ## Find out how big the image is? my $arrsize = $fsize / 6; ## Size of the array and hash my $counter = 0; my %rgb2c; keys %rgb2c = $arrsize; ## Allocate one large memory hash! my @rgb[$arrsize] = ''; ## Allocate one large memory array! while ( 1 ) { $sr_len = sysread(IN, $buf, $bsize); # SysRead Length last if $sr_len == 0; while($buf) { $rgb=substr($buf, 0, 6, ''); # Nibble 6 bytes $rgb2c{$rgb}++; $rgb[$counter} = $rgb; # Build array as you go along $counter++; } }

At this point you have a hash for telling you the number of colors and an array that represents the exact image in 48 bit increments. By pre-allocating the hash and array you make only one call to the operating system for memory for each, instead of millions of calls.

Spend a little more time learning Perl and using efficient algorithms, and you'll have tools that will make you proud.


"Well done is better than well said." - Benjamin Franklin