Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^5: Looking for alternative for IPC::Shareable (or increase size)

by marioroy (Prior)
on Aug 07, 2020 at 12:50 UTC ( [id://11120466]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Looking for alternative for IPC::Shareable (or increase size)
in thread Looking for alternative for IPC::Shareable (or increase size)

Greetings, DomX

I tried Storable and Sereal followed by a parallel demonstration. Testing was done on macOS. To capture the memory consumption (i.e. top -o CPU on macOS), uncomment the busy loop line. Sereal is not only faster but consumes lesser memory consumption.

Storable (updated):

use strict; use warnings; use feature qw(say); use Storable qw(freeze thaw); my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234 +56789-_'; # $data .= $data for 1..10; # 2^16 65536 # $data .= $data for 1..11; # 2^17 131072 # $data .= $data for 1..12; # 2^18 262144 # $data .= $data for 1..13; # 2^19 524288 # $data .= $data for 1..14; # 2^20 1048576 # $data .= $data for 1..15; # 2^21 2097152 # $data .= $data for 1..16; # 2^22 4194304 # $data .= $data for 1..17; # 2^23 8388608 # $data .= $data for 1..18; # 2^24 16777216 # $data .= $data for 1..19; # 2^25 33554432 # $data .= $data for 1..20; # 2^26 67108864 # $data .= $data for 1..21; # 2^27 134217728 # $data .= $data for 1..22; # 2^28 268435456 # $data .= $data for 1..23; # 2^29 536870912 $data .= $data for 1..24; # 2^30 1073741824 say 'data : '.length($data); my $frozen = freeze(\$data); my $thawed = thaw($frozen); say 'frozen : '.length($frozen); say 'thawed : '.length($$thawed); # simulate busy loop: 4102 megabytes in top; 2.106 seconds # 1 for 1..400_000_000; __END__ data : 1073741824 frozen : 1073741844 thawed : 1073741824

Sereal (updated):

use strict; use warnings; use feature qw(say); use Sereal::Encoder qw(encode_sereal); use Sereal::Decoder qw(decode_sereal); my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234 +56789-_'; # $data .= $data for 1..10; # 2^16 65536 # $data .= $data for 1..11; # 2^17 131072 # $data .= $data for 1..12; # 2^18 262144 # $data .= $data for 1..13; # 2^19 524288 # $data .= $data for 1..14; # 2^20 1048576 # $data .= $data for 1..15; # 2^21 2097152 # $data .= $data for 1..16; # 2^22 4194304 # $data .= $data for 1..17; # 2^23 8388608 # $data .= $data for 1..18; # 2^24 16777216 # $data .= $data for 1..19; # 2^25 33554432 # $data .= $data for 1..20; # 2^26 67108864 # $data .= $data for 1..21; # 2^27 134217728 # $data .= $data for 1..22; # 2^28 268435456 # $data .= $data for 1..23; # 2^29 536870912 $data .= $data for 1..24; # 2^30 1073741824 say 'data : '.length($data); my $frozen = encode_sereal(\$data); my $thawed = decode_sereal($frozen); say 'frozen : '.length($frozen); say 'thawed : '.length($$thawed); # simulate busy loop: 3078 megabytes in top; 1.549 seconds # 1 for 1..400_000_000; __END__ data : 1073741824 frozen : 1073741837 thawed : 1073741824

Sereal with compression enabled:

use strict; use warnings; use feature qw(say); use Sereal::Encoder qw(encode_sereal); use Sereal::Decoder qw(decode_sereal); my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234 +56789-_'; # $data .= $data for 1..10; # 2^16 65536 # $data .= $data for 1..11; # 2^17 131072 # $data .= $data for 1..12; # 2^18 262144 # $data .= $data for 1..13; # 2^19 524288 # $data .= $data for 1..14; # 2^20 1048576 # $data .= $data for 1..15; # 2^21 2097152 # $data .= $data for 1..16; # 2^22 4194304 # $data .= $data for 1..17; # 2^23 8388608 # $data .= $data for 1..18; # 2^24 16777216 # $data .= $data for 1..19; # 2^25 33554432 # $data .= $data for 1..20; # 2^26 67108864 # $data .= $data for 1..21; # 2^27 134217728 # $data .= $data for 1..22; # 2^28 268435456 # $data .= $data for 1..23; # 2^29 536870912 $data .= $data for 1..24; # 2^30 1073741824 say 'data : '.length($data); my $frozen = encode_sereal(\$data, { compress => 1 }); my $thawed = decode_sereal($frozen); say 'frozen : '.length($frozen); say 'thawed : '.length($$thawed); # simulate busy loop: 2104 megabytes in top; 2.170 seconds # 1 for 1..400_000_000; __END__ data : 1073741824 frozen : 52428830 thawed : 1073741824

MCE::Channel Demonstration:

MCE::Channel provides two-way communication and uses Sereal if available, otherwise defaults to Storable. For this demonstration, agents send data to the parent process via send2. Likewise, the parent receives data via recv2.

use strict; use warnings; use MCE::Child; use MCE::Channel; my $chnl = MCE::Channel->new(); sub agent_task { my ($id, @args) = @_; my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123 +456789-_'; $data .= $data for 1..24; # 2^30 1073741824 # agent >> parent (via send2) $chnl->send2({ id => $id, data => $data }); } my %procs; MCE::Child->init( void_context => 1, posix_exit => 1 ); $procs{$_} = MCE::Child->create('agent_task', $_, 'arg1', 'argN') for +1..2; while (keys %procs) { # parent << agent (via recv2) my $ret = $chnl->recv2; ( delete $procs{ $ret->{id} } )->join; printf "Agent %d: %d\n", $ret->{id}, length $ret->{data}; } __END__ Agent 1: 1073741824 Agent 2: 1073741824

I have not encountered any limitations with regards to serialization (> 1 billion chars).

Regards, Mario

Replies are listed 'Best First'.
Re^6: Looking for alternative for IPC::Shareable (or increase size)
by jcb (Parson) on Aug 07, 2020 at 22:31 UTC

    If I understand correctly, Sereal includes data compression, while Storable does not. Your test data is highly repetitive and therefore very compressible. Depending on deep details of your hardware, that compression may have made a significant difference by reducing the amount of data copied in some intermediate steps. (You did not output the length of $frozen, so I can only speculate here.) This may or may not be representative of our questioner's data or a meaningful comparison for non-trivial use.

    There is some breakeven point below which the CPU overhead of attempting to compress the data will exceed the cost of simply copying the data. 1GiB of repeated base64 alphabet is obviously well above that point, but that point will vary with real-world data. Algorithms that perform better in the general large case usually have more overhead, and sometimes that can make a significant difference in smaller cases. As an example, I once did an analysis on some code I had written in C and found (to my surprise) that the actual data used was small enough that linear search would be faster than binary search — and that code was in an innermost loop where small gains are worthwhile.

      Hi, jcb

      I updated the examples to output the frozen length. MCE::Channel does not enable compression when using Sereal.

      > 1 billion chars (2^30 1073741824) Storable runtime : 2.106 seconds memory : 4,102 megabytes Sereal runtime : 1.549 seconds memory : 3,078 megabytes Sereal with compression enabled runtime : 2.170 seconds memory : 2,104 megabytes

      Thank you, DomX and jcb. ++ PerlMonks for this forum.

      Humble regards, Mario

        It seems obvious that the version using Storable is holding an extra copy of the data somewhere compared to the version using Sereal. The compression in Sereal is working reasonably well, reducing 1G of highly compressible input to about 52M stored.

        I still wonder if there really is a meaningful difference here — it should be equally possible to compress a frozen Storable image using any of the Compress:: modules to save memory or disk or network time, although it does appear that Sereal may have a slightly more efficient implementation (avoiding that extra copy of the data) that could be considered for adaptation to improve Storable as well. Maybe submit an enhancement request at the bug tracker for Storable?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120466]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-29 04:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found