Re: Benchmark on deserializing data

1- b() is faster than a() because the extra @ary. When benchmarking very fast operations, the overhead of creating variables is not negligible. Besides that, in a() you created a %hash on each iteration of the benchmark, in b() you are reusing %hash_b (forgotten the 'my'?)

2- Same as above... you are not doing the same thing. In c() a %hash is created on each iteration, in d() a global %hash_d is used. If a 'my' is added before %hash_d, then d() is faster than c()

3- Well... Storable is slower than split (from my experience, I can tell you that split is very very fast), so... why use Storable?

- Is less code, faster to write because that, and you don't need to test, debug or document it yourself.

- Is safer. For example, your example fails if the data contains '|'. And if you add escape sequences decoding, how much faster will still be split?

- Later you can change the serialized structure without updtaing the serializer and deserializer.

- Is not so slow, is no near half slow than split based code. The following sub is faster than a():

sub { my $hash_e = thaw($storable_serialized ); }
# and then use $hash_e->{'2'}
[download]

Is easier, bah :-) Anyway, you probably will end spending more time fetching the data from the DB than deserializing it.

Hope this helps, José

Comment on Re: Benchmark on deserializing data Download Code

Replies are listed 'Best First'.
Re^2: Benchmark on deserializing data by shift8 (Novice) on Apr 27, 2007 at 03:11 UTC
another option is XML::Dumper. it's much slower, even then Storable, but it shares all of the same benefits as Storable mentioned by Jose ( and i'll second that the fetch/store will possibly be a bigger bottle neck then the serialize/deserialize too) but also has the advantage of storing in a human readable format, and that's useful for debuging. here's the results w/ e = XML::Dumper, but with freeze and pl2xml moved into the comparison functions as well (for a round-trip idea, fwiw) --------- shift8@2axxon:~$ perl t.pl Benchmark: running a, b, c, d, e for at least 1 CPU seconds... a: 1 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 25 +784.21/s (n=29394) b: 1 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 32 +879.82/s (n=35839) c: 1 wallclock secs ( 1.05 usr + 0.01 sys = 1.06 CPU) @ 46 +10.38/s (n=4887) d: 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 42 +65.71/s (n=4479) e: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 32 +2.12/s (n=335) Rate e d c a b e 322/s -- -92% -93% -99% -99% d 4266/s 1224% -- -7% -83% -87% c 4610/s 1331% 8% -- -82% -86% a 25784/s 7905% 504% 459% -- -22% b 32880/s 10107% 671% 613% 28% -- <perldata> <hashref memory_address="0x82252f8"> <item key="1"> <arrayref memory_address="0x814bc28"> <item key="0">123</item> <item key="1">456</item> <item key="2">678</item> </arrayref> </item> <item key="2">value_2</item> <item key="3">value_3</item> <item key="4">value_4</item> <item key="5">value_5</item> <item key="6">value_6</item> <item key="7">value_7</item> <item key="8">value_8</item> </hashref> </perldata> [download]	[reply] [d/l]
Re^2: Benchmark on deserializing data by RL (Monk) on Apr 27, 2007 at 16:33 UTC
Thanks to all but esp. to you coz unknowingly you've explained well so I now understand the reason for the output I gained but you as well have revealed the obvious which I had missed completely by giving the example of $hash_e. The function I've wrote the benchmark for needs to return a hash-ref anyway. - Shame on me not thinking about it before :) Thx and greats from Europe. RL	[reply]