Hi to all,
I've done some benchmark on deserializing data and don't actually understand why the deserialization of data serialized by Storable is slower than using plain pipe-seperated serialisation.
Maybe I'm missing the obvious, made a mistake or misinterpreted what I read about Storable.
I've got sets of data which I want to store in a database on disk using only core-modules coming with perl. That results in DBM files obviously. (some 10K of datasets to store).
I'm more concerned about retrieval (frequent reads on the DB) than on storing.
The code below therefore only represents retrieval, not storage.
I'm trying to serialize a datastructure in order to store using $DB_HASH format.
Benchmark is deserialisation of pipe-seperated values against Storable's freeze/thaw functions.
Here is the code:
use strict;
use warnings;
use Benchmark qw( :all );
use Storable qw(freeze thaw);
my (%data, %hash_b, %hash_d);
# data
%data = ( 1 => ['123','456','678'],
2 => 'value_2',
3 => 'value_3',
4 => 'value_4',
5 => 'value_5',
6 => 'value_6',
7 => 'value_7',
8 => 'value_8'
);
# prepare simulating retrieved data
my $item_1 = join(' ',@{$data{'1'}});
my $pipe_serialized = $item_1.'|'.$data{'2'}.'|'.$data{'3'}.'|'.$data{
+'4'}
.'|'.$data{'5'}.'|'.$data{'6'}.'|'.$data{'7'}.'|'.
+$data{'8'};
my $storable_serialized = freeze(\%data);
cmpthese( -1, {
# serialized using pipes as a delimiter
a => sub { my @ary = split(/\|/,$pipe_serialized);
my %hash = ();
@hash{'1','2','3','4','5','6','7','8'}
= @ary;
$hash{'1'} = [ split(/ /,$hash{'1'}) ];
},
b => sub { %hash_b = ();
@hash_b{'1','2','3','4','5','6','7','8'}
= split(/\|/,$pipe_serialized);
$hash_b{'1'} = [ split(/ /,$hash_b{'1'}) ];
},
# serialized using storable
c => sub { my $hash_ref = thaw($storable_serialized );
my %hash = %$hash_ref;
},
d => sub { %hash_d = %{ thaw($storable_serialized ) };
},
} );
# check results
use Data::Dumper;
print "hash_b:\n",Dumper(\%hash_b),"\n\n\n";
print "hash_d:\n",Dumper(\%hash_d),"\n\n\n";
And here is the output on my box:
RESULT:
Rate d c a b
d 41155/s -- -25% -49% -59%
c 55138/s 34% -- -31% -45%
a 80388/s 95% 46% -- -20%
b 100486/s 144% 82% 25% --
Ok then, now my questions:
(1) I do understand why _b_ is faster than _a_ - guess it's because of @ary acting as man-in-the-middle
(2) I do not understand why _c_ is faster than _d_. Why does $hash_ref as man-in-the-middle speed things up here?
(3) Most important I don't understand why both methods using Storable are slower than the other two using plain split for deserialization. From what I read I thought Storable was
(a) fast
(b) intended to be used for serializing Perl data structures.
Is there any known point (rule of thumb maybe) where Storable is faster than plain join/split stringification?
Any hint appreciated.
Thx for reading.
RL
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.