in reply to bit by overhead

So - is there any way at all to reduce the overhead of these arrays that doesn't involve pack and unpack

Why exclude solutions that involve pack and unpack?

C:\test>p1 $x = ['20110106', map( rand( 1e5 ), 1..4 ), int( rand 1000 ) ];; print total_size $x;; 440 print for @$x;; 20110106 4663.0859375 274.658203125 53915.4052734375 11352.5390625 145

Using join & split (72% saving):

$y = join $;, @$x;; print total_size $y;; 120 print for split $;, $y;; 20110106 4663.0859375 274.658203125 53915.4052734375 11352.5390625 145

Using pack & unpack (78% saving):

$z = pack 'c/A* d4 n', @$x;; print total_size $z;; 96 print for unpack 'C/A* d4 n', $z;; 20110106 4663.0859375 274.658203125 53915.4052734375 11352.5390625 145

Of course, there is some performance penalty for spliting or unpacking the arrays when thay are used, but it's not onerous:

#! perl -slw use strict; use Devel::Size qw[ total_size ]; use Benchmark qw[ cmpthese ]; my @AoA = map[ '20110106', map( rand( 1e5), 1..4 ), int( rand 1000 ) ], 1 .. 1e5; my @AoS = map{ join $;, @$_; } @AoA; my @AoP = map{ pack 'C/A* d4 i', @$_; } @AoA; print 'AoA: ', total_size \@AoA; print 'AoS: ', total_size \@AoS; print 'AoP: ', total_size \@AoP; cmpthese 10, { AoA => sub { my $sum; for my $i ( 0 .. $#AoA ) { $sum += $AoA[ $i ][ $_ ] for 1 .. 5; } # print $sum; }, AoS => sub { my $sum; for my $i ( 0 .. $#AoS ) { my @a = split $;, $AoS[ $i ]; $sum += $a[ $_ ] for 1 .. 5; } # print $sum; }, AoP => sub { my $sum; for my $i ( 0 .. $#AoP ) { my @a = unpack 'C/A* d4 i', $AoP[ $i ]; $sum += $a[ $_ ] for 1 .. 5; } # print $sum; }, }; __END__ C:\test>880868 AoA: 70400176 AoS: 13549296 80% saving AoP: 10400176 85% saving Rate AoS AoP AoA AoS 0.977/s -- -52% -78% AoP 2.05/s 110% -- -53% AoA 4.39/s 349% 114% --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: bit by overhead
by Anonymous Monk on Jan 06, 2011 at 18:31 UTC
    "I have tried packing and unpacking the data to store it as scalars, but this causes performance to be even slower than without the cache." It turns out that the overhead is onerous - or at least onerous enough that I may as well just not even use the cache.
      "I have tried packing and unpacking the data to store it as scalars, but this causes performance to be even slower than without the cache." It turns out that the overhead is onerous - or at least onerous enough that I may as well just not even use the cache.

      Then you are doing it wrong.

      There is no way that unpacking six values from a packed string should be slower than reading those same six values from disk. And querying them from a DB will be far slower still.

      I guess it is time you started posting some of your code and let us see what you are doing wrong.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Here is the code to encache/decache. I'm not sure what the problem could be, unless I shouldn't be lexically scoping the rows or using A10 instead of something else in the pack/unpack.
        sub to_cache { my $ticker = shift; my $data = shift; my @list; foreach(@$data) { push @list, pack("A10FFFFL", @$_); } $data_cache{$ticker} = \@list; } sub from_cache { my $ticker = shift; my $data = $data_cache{$ticker}; my @rval; foreach (@$data) { my @row = unpack("A10FFFFL", $_); push @rval, \@row; } return \@rval; }