glasswalk3r has asked for the wisdom of the Perl Monks concerning the following question:
Greetings monks,
I've being doing some tests with IPC::Shareable and Parallel::ForkManager and having difficulties in understand how IPC shared memory works with forked processes.
The code was created to read a text file, create a hash with the contents from the text file and compare each key's value with all others by using Text::LevenshteinXS. Looks simple, but I got some issues and doubts.
First, to avoid repeating comparing the same keys' values, I tried to lock the shared segment, remove the key, create a local copy of the hash in the child process, then release the lock. After executing the code with 5 child process, I got some messages like this:
Corrupted storable string (binary v2.7) at ../../lib/Storable.pm (auto +split into ../../lib/auto/Storable/thaw.al) line 415, at /usr/share/p +erl5/IPC/Shareable.pm line 545
Is this a bug in Storable module? Or I'm doing something wrong? If I improve the number of child process, things get worst:
Munged shared memory segment (size exceeded?) at ./dup_finder2.pl line + 45 Munged shared memory segment (size exceeded?) at ./dup_finder2.pl line + 45 Object #3223600 should have been retrieved already at ../../lib/Storab +le.pm (autosplit into ../../lib/auto/Storable/thaw.al) line 415, at / +usr/share/perl5/IPC/Shareable.pm line 545 Object #3223600 should have been retrieved already at ../../lib/Storab +le.pm (autosplit into ../../lib/auto/Storable/thaw.al) line 415, at / +usr/share/perl5/IPC/Shareable.pm line 545 Corrupted storable string (binary v2.7) at ../../lib/Storable.pm (auto +split into ../../lib/auto/Storable/thaw.al) line 415, at /usr/share/p +erl5/IPC/Shareable.pm line 545 Corrupted storable string (binary v2.7) at ../../lib/Storable.pm (auto +split into ../../lib/auto/Storable/thaw.al) line 415, at /usr/share/p +erl5/IPC/Shareable.pm line 545
Since the code is reducing the size of data in the memory segment, I don't understand how data is being munged.
Another issue/doubt is if I try to create a bigger hash in the shared memory by using a file with larger size (the one I used have 31kb and the larger 519kb), IPC::Shareable just get a long time to load the file. I didn't had the patience to wait it finishes (if it would, anyway), but I did:
Is IPC that slow? Or is there a problem with IPC::Shareable?
I'm wondering also if I'm using the correct tool to keep the data synchronized between the child processes. Maybe I would get a better result with MemcacheDB? I would be glad to receive some suggestions.
I'm running this code from a Ubuntu Linux 10.04 with perl 5.10.1
Below follows the code:
#!/usr/bin/perl use warnings; use strict; use Text::LevenshteinXS qw(distance); use Benchmark qw(timediff timestr); use IPC::Shareable qw(:all); use Parallel::ForkManager; my $t0 = Benchmark->new(); my %contacts; my $minimum_score = 80; my $max_childs = 5; my $glue = 1978; my $db = tie %contacts, 'IPC::Shareable', $glue, { create => 1, exclusive => 0, destroy => 1 } or die "cannot tie contacts\n"; # :WARNING:21-10-2010:arfreitas: child process must receive this $SIG{INT} = sub { die "$$ dying\n" }; # :WARNING:21-10-2010:arfreitas: must get registries AFTER creating th +e shared memory tie get_db_data(); my @control = keys(%contacts); print 'Starting processing data with maximum of ', $max_childs, ' chil +ds', "\n"; my $manager = Parallel::ForkManager->new($max_childs); $manager->set_max_procs($max_childs); my $max_tries = 3; foreach my $id (@control) { $manager->start() and next; my $tries = 0; if ( exists( $contacts{$id} ) ) { while (1) { if ( $db->shlock(LOCK_EX) ) { print "child $$ got a lock\n"; # :BUG:21-10-2010:arfreitas: race condition when deleting keys of the + tied hash my $testing = delete( $contacts{$id} ); my %cache = %contacts; unless ( $db->shunlock() ) { print "child $$ could not release the lock\n"; } else { print "child $$ released the lock\n"; } validate_contact( $id, $testing, \%cache ); last; } else { print "child $$ can't lock\n"; sleep 1; $tries++; last if ( $tries >= $max_tries ); } } } else { print "child $$: somebody deleted key $id\n"; } $manager->finish(); } $manager->wait_all_children(); my $t1 = Benchmark->new(); my $td = timediff( $t1, $t0 ); print "\nThe code took: ", timestr($td), "\n"; sub validate_contact { my $id = shift; my $testing = shift; my $cache_ref = shift; my $file = 'tmp/processing-' . $id . '.log'; my $source_name = $testing; my $counter = 0; foreach my $contact ( keys( %{$cache_ref} ) ) { my $dest_name = $cache_ref->{$contact}; my $dest_len = length($dest_name); $dest_len = 1 unless ( $dest_len > 0 ); my $distance = distance( $source_name, $dest_name ); my $score = 100 - ( ( $distance * 100 ) / $dest_len ); #how much equal is the current row with the original one if ( $score >= $minimum_score ) { print "$id looks like $contact\n"; } $counter++; } print "child $$ compared id $id with $counter contacts\n"; } sub get_db_data { my $file = 'contacts.txt'; open( my $in, '<:utf8', $file ) or die "cannot read $file: $!\n"; while (<$in>) { chomp; my @temp = split( /\|/, $_ ); my $id = splice( @temp, 1, 1 ); $contacts{$id} = $temp[0]; } close($in); print "OK\n"; return; }
Thank you in advance,
|
|---|