Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Help requested to find the cause of an "uninitialized value" message

by Marshall (Canon)
on Jan 24, 2022 at 16:36 UTC ( [id://11140809]=note: print w/replies, xml ) Need Help??


in reply to Help requested to find the cause of an "uninitialized value" message

It would be helpful if you could share more details about how the parent and the child communicate. What does "This hash is then transacted back to the parent" mean? Could be some sort of unlocked shared data structure problem that only very, very rarely causes a problem. The problem is likely to be in the inter thread or inter process communication and a simple "breakpoint" that just shows you that things are screwed up at the moment may not be that helpful in figuring out how that happened in the first place. This does not sound a like a simple logic error.

Update: It is possible to install your own WARN signal handler. SIG handler hash and I have done so on a localized basis for certain unusual situations. If you do that, then you can dump out probably a bunch of state information about what was going on at the time that this occurred. This might give some clue of where to look? In the best case, this is a simple logic error which somehow causes the result to the parent to be undefined. More difficult is when there is a rare communications error. Sometimes only serious code inspection finds these things.

I remember one problem that actually lasted a couple of years. We knew from field reports that a particular error was happening but we couldn't find the cause. We ran hundreds of millions of test cases on lab machines and were never able to reproduce the problem. In a rather successful delaying tactic, we offered the field a $10K "bounty" if they could find a situation that was reproduceable. Of course we knew that this would never happen, but if by a miracle that it did, we would have happily paid the $10K! The solution was only found by detailed code inspection, an unlocked shared data structure. I deal where like x=3 the y=4. If a context switch happened exactly between those 2 "writes to memory", bingo, the problem happened because x and y were inconsistent. The time frame where the system was "vulnerable" was extremely small. But if you do enough operations, something that happens once per billion or 100 billion will show up!

Anyway a communications failure between parent and child is a possible cause. Something that "shouldn't happen" on a logical basis, but does happen albeit very rarely due to the timing of locks (or lack thereof) and sharing information between threads/processes.

  • Comment on Re: Help requested to find the cause of an "uninitialized value" message

Replies are listed 'Best First'.
Re^2: Help requested to find the cause of an "uninitialized value" message
by andyok (Novice) on Jan 25, 2022 at 06:44 UTC
    Thank you for your response. I'll do my best to add more detail (without being too detailed).
    So at script startup-time, the parent immediately forks the child:
    $socket = socketpair($child_fh, $parent_fh, .....); $child_fh -> autoflush(); $parent_fh -> autoflush(); $pid = fork(); ....

    So the child and parent can use these file-handles to send and receive.
    In the child process:
    while(1) { my %INFO = (); .... stuff-which-populates-%INFO ... my $dumped = Dumper(\%INFO); print $parent_fh ":MSG:${dumped}:DONE:\n"; sleep 10; }
    %INFO will always be populated ... it will never be null.
    I am mindful that $dumped could be upwards of 10Kbytes.
    Hence, I frame $dumped with :MSG: and :DONE:.

    The parent process is a Perl/Tk GUI and so uses fileevent() on $child_fh to invoke a callback whenever there is activity on the file-handle.
    So the call-back is of the form:
    state $collected; $num = read($child_fh, $buf, 1e6); if ($num) { $collected .= $buf; if ($collected =~ /:MSG:(.*):DONE:/s) { $dumper = $1; %info = %{ eval $dumper } $collected = ""; } }
    I make sure that the opening and closing framing text (:MSG: and :DONE:) is present, just in case the data is transacted in dribs-and-drabs.
    As I write this, I guess it is possible that the asynchronous nature of data being collected in $collected could be of the form:
    :MSG:<Dumped-data-from-%INFO>:DONE::MSG:<Dumped-data-from-another-%INFO>:DONE:
    And therefore the pattern match will match: <Dumped-data-from-%INFO>:DONE::MSG:<Dumped-data-from-another-%INFO>
    which is not in the correct format for the eval.
    At the time, I discounted this because of the autoflush ... but perhaps that is a false assumption.

    Following on from some of the tips provided, I have inserted: local $SIG{__WARN__} = \&WhatsWrong; and I have defined a function WhatsWrong which can then print out the value of $dumper.
    It will take a few weeks for the problem to occur, so I will keep you updated when I have more information.
    Thank you very much.
      And therefore the pattern match will match: <Dumped-data-from-%INFO>:DONE::MSG:<Dumped-data-from-another-%INFO>

      That's definitely something I would recommend you look into. Another possibility is that $collected is something like ":MSG:stuff:DONE::MSG:stuff" when you do $collected = "", breaking the stream, or that :MSG: and :DONE: are a part of the Dumper output. Although I still don't recommend using Data::Dumper and especially eval for this, here are some steps you can take. First, make your start and end markers a little more robust: I think for example Data::Dumper wouldn't output a "\0" when its Useqq option is turned on (and you can even validate this by checking the string before you send it). Second, use a s/// regex to remove output you get from $collected instead of completely clearing it, and/or validate the format of $collected before you do.

      If you still have trouble, it would be great if you could combine the tidbits of code you've provided into an SSCCE that reproduces the issue, then I'm sure we can get it debugged pretty quick.

      Update: A few minor additions to the above.

        Thank you for your comments.
        I have created a small test-case which at least causes the same error-message to be generated.
        (I appreciate that many causes may generate the same error-message).

        In my small test-case, if I have a "corrupted" $dumped that is passed to eval, then I get the same error-message.
        And by corrupted, I mean:
        • that the collection/appending of snippets of string from read() onto $collected
        • followed by the extraction of one framed message
        • followed by the complete nulling of $collected without testing to see if there is a residue of text that should be preserved
        could result in a ill-formated string that gets passed to eval.

        And, in such an instance, I note that the result of the eval results in a %info that is empty.

        (It never crossed my mind to consider that %info = %{ eval $dumper}</code} would result in a null %info.)
        So this then becomes another possible trigger for break-pointing or printing out extra information at/around the failure point.

        Thank you for your thoughts and time.
        Andrew
      Are you on Windows or Unix? That might have an influence.

      I was wondering if there was some scenario where %INFO could actually be an empty hash. Thanks for clarifying that you've thought about that. Also, the possible 10K message is significant.

      In your child, the autoflush part looks good to me. However, I am not sure exactly what will happen with more than one send buffer's worth of data. Probably the buffer size is 4K as a good guess. The "print" will not be "atomic". Could be separate 4K, 4K, then 2K writes.

      In your parent, $num = read($child_fh, $buf, 1e6); you are assuming that all of the data that the child sent is available at once. Usually that will be true (what the child does is very fast), but sometimes that may not be true. What happens if you read the pipe between the sending of one of child's buffers? You would get everything sent so far, but perhaps not all that will be sent by the child. If the child has only put a single 4K packet into the pipe and is still working on the next one, your statement will just ignore the data that is to come.

      Your parent read loop should read data until is sees ":DONE:". It probably would be ok to use a blocking call for this on a line by line basis. The reasoning is that the child's transmission job is fast. If a buffer containing the lines received don't start with ":MSG:", then you have a malformed message. I see no reason for state $collected; I would just use a normal "my" variable with appropriate scope.

      You actually have a fair amount of time that can be spent in the parent read routine. You are reading this pipe because you got a notification that at least one byte was available. The child will be sending all of the other bytes forthwith. Make sure that you are getting the entire message. The GUI user will not notice a couple of ms of delay.

      Update: If I understand correctly, you are getting about 8,640 messages per day (once every 10 seconds). One out of 20-30K messages fails? This one time per 25K sounds like an inter-process com failure. I would put some code in to detect "malformed message" (got DONE, but buffer doesn't start with MSG).

      Update: I was thinking about this some more and if you were randomly polling the client, this kind of thing wouldn't happen nearly as often as once per 20K times - you are seeing a problem "relatively often". However, you have special "push" situation that increase the probability of an error. The parent will be notified very soon after a large data buffer is starting to be written to the pipe - I would think at least after the first 4k. The parent could get to the pipe before the client finishes sending. You could wind up with a partial message or a message containing fragments from a previously not completely received message.

      I very much like the idea of what you did with adding :MSG: and :DONE: The parent should wind up synchronized so that it only gets complete messages, one at a time. After you see a :DONE: line, I don't think that there is any tricky stuff that you need to do in case there is another message "stacked behind" that one. The client only sends these things once every 10 seconds. When you see the :DONE: line, if the buffer doesn't have :MSG: right at the start of it, then you have a framing error. If there are no framing errors, then we into a different kind of mystery. Once established, you can assume that a connection like you have is without transmission errors.

      Change the parent read loop to look for ":DONE" do not assume that a single read will do that.

      To be more specific, (untested of course), in the child:
      put beginning and ending key words on separate lines
      change  print $parent_fh ":MSG:${dumped}:DONE:\n";
      to print $parent_fh ":MSG:\n${dumped}\n:DONE:\n";

      In the parent - something like this...:

      # OK to use blocking calls, the data will be there # don't read past the "DONE", wait for next file event # otherwise this will "hang". my $msg; while (my $line = <$child_fh>) { $msg =. $line; last if $line =~ /^:DONE:\s*$/; } if ($msg !~ /^:MSG:/) {framing error!} do what you need to do... $msg = undef;
      I assuming that there is no real performance issue here at all. And if this is 10K chars per 10 secs, there won't be.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11140809]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-16 22:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found