Re^7: bit by overhead

Making a small change to the implementation of your routines yeilds a 3x performance improvement:

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

my %data_cache;

sub to_cache {
    my $ticker = shift;
    my $data = shift;

    my @list;

    foreach(@$data) {
        push @list, pack("A10FFFFL", @$_);
    }

    $data_cache{$ticker} = \@list;
}

sub from_cache {
    my $ticker = shift;
    my $data = $data_cache{$ticker};
    my @rval;

    foreach (@$data) {
       my @row = unpack("A10FFFFL", $_);
       push @rval, \@row;
    }

    return \@rval;
}

my %cache2;
sub to_cache2 {
    my( $ticker, $data ) = @_;

    $cache2{$ticker} = [ map pack("A10FFFFL", @$_), @$data ];
}

sub from_cache2 {
    my $ticker = shift;
    return [ map unpack("A10FFFFL", $_), $cache2{$ticker} ];
}


our @AoA = map[
    '20110106', map( rand( 1e5), 1..4 ), int( rand 1000 )
], 1 .. 251;

cmpthese -1, {
    orig => sub {
        to_cache( $_, \@AoA )      for 1 .. 100;
        my $ref = from_cache( $_ ) for 1 .. 100;
    },
    mod1 => sub {
        to_cache2( $_, \@AoA )      for 1 .. 100;
        my $ref = from_cache2( $_ ) for 1 .. 100;
    },
};

__END__
C:\test>880868-2
       Rate orig mod1
orig 8.13/s   -- -76%
mod1 33.4/s 310%   --
[download]

Do you use all 251 sets of 6 values every time you retrieve the data from the cache?

The gist of where I'm going with this, is that if you don't use them all each time, you might be better to use a two level cache so that you unpack less data each time.

Or, if you do use all the value for each ticker each time, then could you not cache the results of whatever you do with them, rather than the raw data itself?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^7: bit by overhead Download Code

Replies are listed 'Best First'.
Re^8: bit by overhead by ikegami (Patriarch) on Jan 06, 2011 at 21:42 UTC
Or, if you do use all the value for each ticker each time ...you could collapse the remaining array too. Update: It's slower though?? `my %cache3; sub to_cache3 { my( $ticker, $data ) = @_; $cache3{$ticker} = pack "(A10FFFFL)*", map @$_, @$data; } sub from_cache3 { my $ticker = shift; return [ map [ unpack "A10FFFFL", $_ ], $cache3{$ticker} =~ /.{46} +/sg ]; }` [download] `Rate mod2 orig mod1 mod2 9.43/s -- -1% -72% orig 9.52/s 1% -- -72% mod1 33.7/s 257% 253% --` [download]	[reply] [d/l] [select]
Re^9: bit by overhead by BrowserUk (Patriarch) on Jan 06, 2011 at 22:12 UTC
I tried almost the same thing with the same result: `my %cache3; sub to_cache3 { my( $ticker, $data ) = @_; $cache3{$ticker} = pack "(A10FFFFL)", map @$_, @$data; } sub from_cache3 { my $ticker = shift; return [ map[ unpack("A10FFFFL", $_) ], unpack '(A[A10FFFFL])', $cache3{$ticker} ]; } C:\test>880868-2 Rate mod2 orig mod1 mod2 18.9/s -- -3% -76% orig 19.4/s 3% -- -75% mod1 78.5/s 316% 304% --` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^8: bit by overhead by Anonymous Monk on Jan 07, 2011 at 16:07 UTC
Perhaps I'm missing something here: `sub from_cache2 { my $ticker = shift; return [ map unpack("A10FFFFL", $_), $cache2{$ticker} ]; }` [download] Shouldn't that be: `sub from_cache { my $ticker = shift; my $ref = $data_cache{$ticker}; return [ map [unpack("A10FFFFL", $_)], @$ref ]; }` [download] The other way doesn't seem to work, since map wants an array, not an array reference, and since the data is stored in a multidimensional array, shouldn't each row be pushed as a reference?	[reply] [d/l] [select]
Re^9: bit by overhead by BrowserUk (Patriarch) on Jan 07, 2011 at 16:40 UTC
You are correct, that is a bug in the benchmark. I'd code it this way: `sub from_cache2 { my $ticker = shift; return [ map unpack("A10FFFFL", $_), @{ $cache2{$ticker} } ]; }` [download] Unfortunately, it takes the edge off the speedup :( `C:\test>880868-2 Rate orig mod1 orig 8.01/s -- -36% mod1 12.4/s 55% --` [download] Still worth having but less so. If you are the OP of this thread, then you should seriously consider the ideas in the last paragraph of my previous post. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^10: bit by overhead by Anonymous Monk on Jan 07, 2011 at 17:39 UTC
I have indeed tried your suggestion: `sub to_cache { my( $ticker, $data ) = @_; $data_cache{$ticker} = pack "(A10FFFFL)", map @$_, @$data; } sub from_cache { my $ticker = shift; return [ map[ unpack("A10FFFFL", $_) ], unpack '(A[A10FFFFL])', $data_cache{$ticker} ]; }` [download] But it doesn't seem to be working. Is this code assuming that the packed rows are stored together as a single scalar?	[reply] [d/l]
Re^11: bit by overhead by BrowserUk (Patriarch) on Jan 07, 2011 at 18:14 UTC