Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

I'm facing an issue with my parallel-processing code that has been very frustrating (currently perl 5.8.8, but I will be pushing for 5.14.0, or newer, as soon as I can). It's frustrating in a few ways, not least of which is that it's only intermittently repeatable (it happens rarely - often rerunning without changing anything will get past the issue), it has never happened on my Linux box, only on AIX, and I can't figure out a small piece of code that repeats it.

The code that actually gets the error message is in a piece of code that pseudo-JSONises a data structure for logging purposes. Encoding to JSON-ish output is actually fairly straight-forward, and mine is recursive (a table lookup of what to do next, head to that function to handle the current item, which may call back to the table lookup for embedded objects). In this case, it's handling a simple scalar:

sub _scalar_to_string { my $s = shift; return 'null' unless defined $s; if (ref $s) # object other than plain HASH, ARRAY, or SCALAR? { #... (this is something JSON doesn't handle) } elsif (looks_like_number($s)) { return $s } else { eval { $s = "" . $s; $s =~ s[\\][\\\\]g; ### error points here $s =~ s["][\\"]g; $s =~ s[\n][\\n]g; $s =~ s[[\b]][\\b]g; $s =~ s[\f][\\f]g; $s =~ s[\r][\\r]g; $s =~ s[\t][\\t]g; $s =~ s[([^[:print:]])][sprintf "\\u%04x", ord $1]eg; 1; } and return qq["$s"]; return qq[":$@:$s"]; } }
As you can see, I've already found the error occuring, and wrapped it in an eval, so that I could log the error. The output looks like this:
End process handler: ":Modification of a read-only value attempted at +$file line 228. :172.23.1.10"
The first confusion is how the heck $s could be readonly when I've assigned to it - twice. It's not like it should be an alias of the input, even if that were readonly (which I'm not sure how that's happening either). If there's any way to eliminate this error, that would be very helpful.

The second issue is in the caller. The whole thing would likely work better in Coro than it does right now, but, since we're actually shipping this code to customers, the lawyers want to spend way too much time on "due diligence" - so I don't have approvals yet. Even AnyEvent would be an improvement, I suspect. Regardless, the basic idea is that I'm running ssh to each of a bunch of machines (via IPC::Open3), using an IO::Select loop to read stdout/stdin from each ssh, and a SIGCHLD handler to suck in the return codes. I'm thinking that the SIGCHLD must be coming back at just the wrong time, and a hash being used to track all of this is getting clobbered: a value from the hash is pushed into an array to track children that need to be handled while that value is being deleted:

# keep track of the reaped children so we can deal with th +em # later. push @reaped, delete $pids{$child};
I'm not sure if a SIGCHLD could come in to interrupt this - what it then looks like is that %pids has the key still, but the value is undef ("null" in the trace file). And when I see this in the trace file, I get the read-only error as well, even though the trace is showing that it does have the right value somehow.

Note that the stuff below doesn't quite work because JSON can't handle CODE refs as far as I can tell. Other than that, it's a close approximation of the code - but I still can't force it to fail on demand.

#!/usr/bin/perl use JSON; sub UNIVERSAL::TO_JSON { "@_" } use strict; use warnings; use Time::HiRes qw(sleep); use Scalar::Util qw(refaddr weaken readonly); use POSIX qw(:sys_wait_h); my $sub_kids = 30; $|=1; sub run { my %rcs; my %pids; my $reaper; my @reaped; $reaper = sub { my $child = shift; my $rc = shift; if ($pids{$child}) { $rcs{$pids{$child}}{rc} = $rc; print "Child: $child RC: $rc ITER: $pids{$child}\n"; # keep track of the reaped children so we can deal with th +em # later. push @reaped, delete $pids{$child}; print 'reaped list: ', encode_json(\@reaped), "\n"; # JSON + is very compact cf Data::Dump et al return 1; } return 0; }; register_sigchild($reaper); my $dealt_with = 0; my $deal_with_children = sub { ++$dealt_with; my $child = shift @reaped; if(readonly($child)) { print "********child readonly?**************\n" ; die; } print "Dealing with: ", encode_json([$child]), "\n"; print " - got: ", encode_json($rcs{$child}), "\n"# if $child; # code shouldn't make a difference here. }; for my $kid (1..$sub_kids) { my $pid = run_async($kid % 2 ? '/bin/true' : '/bin/false'); $pids{$pid} = $kid; } while (@reaped or keys %pids) { print "Waiting for children to end: ", encode_json(\%pids), "\ +n"; sleep 10 unless @reaped; while (@reaped) { $deal_with_children->(); } } unless ($dealt_with == $sub_kids) { warn "Only dealt with $dealt_with of $sub_kids?"; } unregister_sigchild($reaper); } sub run_async { # normally, I use IPC::Open3, but here we're just being quick/dirt +y. # with that, we're assuming fork will work. my $pid = fork(); return $pid if $pid; die "failed to fork" unless defined $pid; sleep 0.01 * rand(50); exec @_; die "Huh? exec failed?" } my %sigs; sub register_sigchild { my $sig = shift; $sigs{refaddr $sig} = { sig => $sig, from => (caller(1))[3], }; weaken($sigs{refaddr $sig}{sig}); $SIG{CHLD} = \&handle_sigchild; } my @buffer; sub handle_sigchild { my $child; unregister_sigchild(); while (($child = 0+ waitpid(-1, WNOHANG)) > 0) { my $rc = 0+ $?; print "Child: $child RC: $?\n"; push @buffer, [$child, $rc]; } my @exited = @buffer; @buffer = (); for my $exit (@exited) { for my $sig (values %sigs) { my $last; my $ref = $sig->{sig}; # ensure strong reference now next unless $ref; if ( eval { $last = 1 if $ref->(@$exit); 1; } ) { last if $last; } else { delete $sigs{refaddr $ref}; } # no one handled yet? push @buffer, $exit; } } unregister_sigchild(); } sub unregister_sigchild { if (@_) { delete $sigs{refaddr $_[0]} if $_[0]; $_[0] = undef; # action at a distance! } for my $addr (keys %sigs) { delete $sigs{$addr} unless defined $sigs{$addr}{sig}; } print 'About to set $SIG{CHLD}', "\n"; $SIG{CHLD} = keys %sigs ? \&handle_sigchild : 'DEFAULT'; #print 'SIGS: ', encode_json(\%sigs), "\n"; } run();

Replies are listed 'Best First'.
Re: Modification of read-only value
by onelesd (Pilgrim) on Jun 13, 2011 at 21:23 UTC

    Just a thought, but it seems like this could be done more simply (less code written by you) with threads.

    You should also upgrade your perl, since there is a lot of updated threads-related code since 5.8.

      I'm not sure how much less code it would be with threads since threads don't easily share data between them, and I need to pull all the data from all the subprocesses back together at the end. So I'd be exchanging one large chunk of code for another. In general, I'm more familiar with IPC::Open3 and IO::Select than ithreads, so I chose this way due to familiarity. Now, if that's the only option I have for eliminating this intermittent issue, I'm more than willing to try it, but it'll be hard to prove ;-)

      As for upgrading perl, I thought I addressed that in the original node. I will be pursuing the upgrade of perl, regardless of this issue, anyway. But it'll take time to get through all the lawyers and managers.

        By all means, go with what's comfortable. I didn't really grok all your code, but personally I hate thread management, and I like that "threads" takes care of a lot of that for you. And, "threads::shared" handles the passing and locking of shared data quite well.

        I saw you mention the perl upgrade issue in your OP - I kicked it again because I've had intermittent thread issues in the past that simply disappeared by upgrading perl from 5.8 to 5.12.

Re: Modification of read-only value
by Khen1950fx (Canon) on Jun 13, 2011 at 21:18 UTC
    At line 110: WNOHANG: I got an undefined subroutine warning. I changed it to \&WNOHANG. Works fine.
      Arbitrarily substituting in the address of an undefined symbol for a missing constant is completely wrong.
Re: Modification of read-only value
by Anonymous Monk on Jun 14, 2011 at 04:49 UTC
    FWIW, I can't replicate your problem, but my platform uses fork emulation , and when I run your program no reaping occurs
    $ perl -d:Modlist -le " use JSON; use Time::HiRes; use Scalar::Util; u +se POSIX; print $^V; print $^O; " v5.12.2 MSWin32 AutoLoader 5.71 Carp 1.17 Config Config_git.pl Config_heavy.pl DynaLoader 1.10 Exporter 5.64_01 Exporter::Heavy 5.64_01 Fcntl 1.06 JSON 2.53 JSON::XS 2.3 List::Util 1.23 POSIX 1.19 Scalar::Util 1.23 Tie::Hash 1.03 Time::HiRes 1.9724 XSLoader 0.15 attributes 0.12 base 2.15 common::sense 3.4 constant 1.21 overload 1.10 vars 1.01 warnings 1.09 warnings::register 1.01
Re: Modification of read-only value (XS--)
by tye (Sage) on Jun 14, 2011 at 04:30 UTC

    Switch from JSON (which defaults to thunking to JSON::XS) to JSON::PP and watch this mysterious behavior mysteriously disappear.

    No, I can't guarantee such results. But my batting average on that prediction is actually pretty high so far.

    - tye        

      Unfortunately, this one isn't improving your batting average :-)

      $ perl5.8.8 /tmp/z.pl About to set $SIG{CHLD} About to set $SIG{CHLD} encountered CODE(0x10ec290), but JSON can only represent references to + arrays or hashes at /tmp/z.pl line 162 encountered CODE(0x10ec290), but JSON can only represent references to + arrays or hashes at /tmp/z.pl line 162
      Or is this not the mysterious behaviour you're referring to, as I did have two mysterious behaviours to deal with here? Looks like I'm going to have to keep with my custom JSON-ish output until I find a better dumper :-) (Which is getting woefully off-topic for this thread, but I'll point out that I like the conciseness of JSON, but need some more flexibility, e.g., dumping stuff other than scalars, arrays, hashes, and nulls, but at the same time, not dumping all the contents of every object like Data::Dump(er) are wont to do.)

      However, I don't think this is related to my original problem where I'm not using JSON::XS, either, but my own pseudo-JSON-ish generator. The read-only error crops up in that generator code (thus why it shows up in my original node), but it's called from the signal handling code, where I'm suspicious the real error comes from.

        However, I don't think this is related to my original problem where I'm not using JSON::XS, either

        Oh, the big block of code is for a separate process than the one getting the mysterious error referenced in the node title (modification of read-only value) ? The only code you give us for the process that experiences the in-title error is just the one subroutine? So the vast majority of the write-up is about something not alluded to in the title? I guess you seriously and thoroughly buried the lead.

        You lost me in the rambling about political problems and making excuses for not doing some things some other way and described some stuff that didn't make much sense to me on first reading but then said that it was the point at which the original error happens and so I didn't think it was a separate problem, just more explanation about where you see the one problem.

        So I didn't try to re-read it and make sense of it all since the in-title error (that you described quite nicely) is not something that happens because of anything you do in Perl code. It happens because somebody did something stupid or wrong in C code.

        You either have found a bug in Perl or you have found a bug in a module that uses XS code. The odds between those two buckets are very, very lopsided. (Or you have disabled "safe signal handlers" without doing significant amounts of work to compensate for each invocation of an unsafe signal handler having a 10% chance of just making Perl marginally insane -- but I see no mention of "safe" so I assume that isn't the case.)

        Find the module that you are using that uses the most XS code and replace it with something else. You'll have pretty good odds (IME) of the (in-title) error going away.

        So, what is the huge block of code that you appear to never describe beyond calling it "the stuff below"? I have no idea at this point. I (rather blithely, it appears) assumed that it was code that actually had something to do with the error in the title. If it isn't code that actually reproduces the error, I thought it would be rather clearly labeled as not such.

        As for using JSON for more arbitrary data, we wrote a data sanitizer that replaces objects that define stringification with their stringified value, replaces code references with { CODE => getNameOfSubroutine($ref) }, replaces scalar references with { SCALAR => $$ref }, etc. so that we get the important information and not lots of noise and just pass that in to be formatted as JSON.

        Of course, one of the really nice things about the JSON format is that it is so extremely simple. Rolling your own version of it shouldn't take more than a couple of dozen of lines of code so one shouldn't feel too shy about doing that when there is reason to.

        - tye