Greetings, DomX
I tried Storable and Sereal followed by a parallel demonstration. Testing was done on macOS. To capture the memory consumption (i.e. top -o CPU on macOS), uncomment the busy loop line. Sereal is not only faster but consumes lesser memory consumption.
Storable (updated):
use strict;
use warnings;
use feature qw(say);
use Storable qw(freeze thaw);
my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234
+56789-_';
# $data .= $data for 1..10; # 2^16 65536
# $data .= $data for 1..11; # 2^17 131072
# $data .= $data for 1..12; # 2^18 262144
# $data .= $data for 1..13; # 2^19 524288
# $data .= $data for 1..14; # 2^20 1048576
# $data .= $data for 1..15; # 2^21 2097152
# $data .= $data for 1..16; # 2^22 4194304
# $data .= $data for 1..17; # 2^23 8388608
# $data .= $data for 1..18; # 2^24 16777216
# $data .= $data for 1..19; # 2^25 33554432
# $data .= $data for 1..20; # 2^26 67108864
# $data .= $data for 1..21; # 2^27 134217728
# $data .= $data for 1..22; # 2^28 268435456
# $data .= $data for 1..23; # 2^29 536870912
$data .= $data for 1..24; # 2^30 1073741824
say 'data : '.length($data);
my $frozen = freeze(\$data);
my $thawed = thaw($frozen);
say 'frozen : '.length($frozen);
say 'thawed : '.length($$thawed);
# simulate busy loop: 4102 megabytes in top; 2.106 seconds
# 1 for 1..400_000_000;
__END__
data : 1073741824
frozen : 1073741844
thawed : 1073741824
Sereal (updated):
use strict;
use warnings;
use feature qw(say);
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234
+56789-_';
# $data .= $data for 1..10; # 2^16 65536
# $data .= $data for 1..11; # 2^17 131072
# $data .= $data for 1..12; # 2^18 262144
# $data .= $data for 1..13; # 2^19 524288
# $data .= $data for 1..14; # 2^20 1048576
# $data .= $data for 1..15; # 2^21 2097152
# $data .= $data for 1..16; # 2^22 4194304
# $data .= $data for 1..17; # 2^23 8388608
# $data .= $data for 1..18; # 2^24 16777216
# $data .= $data for 1..19; # 2^25 33554432
# $data .= $data for 1..20; # 2^26 67108864
# $data .= $data for 1..21; # 2^27 134217728
# $data .= $data for 1..22; # 2^28 268435456
# $data .= $data for 1..23; # 2^29 536870912
$data .= $data for 1..24; # 2^30 1073741824
say 'data : '.length($data);
my $frozen = encode_sereal(\$data);
my $thawed = decode_sereal($frozen);
say 'frozen : '.length($frozen);
say 'thawed : '.length($$thawed);
# simulate busy loop: 3078 megabytes in top; 1.549 seconds
# 1 for 1..400_000_000;
__END__
data : 1073741824
frozen : 1073741837
thawed : 1073741824
Sereal with compression enabled:
use strict;
use warnings;
use feature qw(say);
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234
+56789-_';
# $data .= $data for 1..10; # 2^16 65536
# $data .= $data for 1..11; # 2^17 131072
# $data .= $data for 1..12; # 2^18 262144
# $data .= $data for 1..13; # 2^19 524288
# $data .= $data for 1..14; # 2^20 1048576
# $data .= $data for 1..15; # 2^21 2097152
# $data .= $data for 1..16; # 2^22 4194304
# $data .= $data for 1..17; # 2^23 8388608
# $data .= $data for 1..18; # 2^24 16777216
# $data .= $data for 1..19; # 2^25 33554432
# $data .= $data for 1..20; # 2^26 67108864
# $data .= $data for 1..21; # 2^27 134217728
# $data .= $data for 1..22; # 2^28 268435456
# $data .= $data for 1..23; # 2^29 536870912
$data .= $data for 1..24; # 2^30 1073741824
say 'data : '.length($data);
my $frozen = encode_sereal(\$data, { compress => 1 });
my $thawed = decode_sereal($frozen);
say 'frozen : '.length($frozen);
say 'thawed : '.length($$thawed);
# simulate busy loop: 2104 megabytes in top; 2.170 seconds
# 1 for 1..400_000_000;
__END__
data : 1073741824
frozen : 52428830
thawed : 1073741824
MCE::Channel Demonstration:
MCE::Channel provides two-way communication and uses Sereal if available, otherwise defaults to Storable. For this demonstration, agents send data to the parent process via send2. Likewise, the parent receives data via recv2.
use strict;
use warnings;
use MCE::Child;
use MCE::Channel;
my $chnl = MCE::Channel->new();
sub agent_task {
my ($id, @args) = @_;
my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123
+456789-_';
$data .= $data for 1..24; # 2^30 1073741824
# agent >> parent (via send2)
$chnl->send2({ id => $id, data => $data });
}
my %procs;
MCE::Child->init( void_context => 1, posix_exit => 1 );
$procs{$_} = MCE::Child->create('agent_task', $_, 'arg1', 'argN') for
+1..2;
while (keys %procs) {
# parent << agent (via recv2)
my $ret = $chnl->recv2;
( delete $procs{ $ret->{id} } )->join;
printf "Agent %d: %d\n", $ret->{id}, length $ret->{data};
}
__END__
Agent 1: 1073741824
Agent 2: 1073741824
I have not encountered any limitations with regards to serialization (> 1 billion chars).
Regards, Mario