Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Win32 - Memory can not be "read"

by jbert (Priest)
on Oct 05, 2006 at 08:21 UTC ( [id://576483]=note: print w/replies, xml ) Need Help??


in reply to Win32 - Memory can not be "read"

Are the boxes under quite a lot of load? Does this load vary during the day and does the occurence of the problem relate to this in any way?

You mention that the scripts are spawning children. Are they waiting for them to exit or running in parallel with them? How about the child scripts? Does task manager show a shedload of child processes running?

If you can't capture the problem at the time, turn on performance monitoring and graph these things over the day, so you can look for spikes/ceilings around the time of the problem. Also, don't just look at the bad machines, duplicate all the measurements on the 'good' ones and look for differences.

Do the machines tend to get sick at approx the same time (suggests an external, i.e. network factor)? Look at the network topology, do the sick machines share any factors there (same switch?)?

Some possibilities: 'memory' (vm exhaustion?), number of handles per process, maximum process stack depth (not perl stack, the underlying C stack), number of threads per proc, total number of procs running on the box, total number of threads on the box, etc.

Since its intermittent, it could be a more classic race, caused by general slowdown etc. So, lastly, can you 'induce' an episode by adding some load to one of your machines? Try different types of load (a cpu burner, a mem hog, a ping flood, a process which starts a lot of children).

It sounds like a build environment, so I wouldn't imagine the perl is too complex - is this right? Or is there a lot of hairy code in there? And again, what about the child processes?

Hope this helps, intermittent probs are always tough.

Good luck.

Replies are listed 'Best First'.
Re^2: Win32 - Memory can not be "read"
by HuckinFappy (Pilgrim) on Oct 05, 2006 at 14:53 UTC
    Thanks for all the ideas jbert. The machines are under relatively significant load, but not in terms of number of processes. These dual processor machines have the following on them:
    • A perl client script (well, 2...one per processor)
    • Client polls server to get next available build job
    • Client spawns perl script (and then waits for it) which:
      1. converts make template into make file
      2. runs make
    • client notifies server job is complete and gets next one
    So there's not a lot going on in parallel, but the network/cpu/memory can get pretty hammered (the local disk is almost a noop in most of these cases)

    The perl is slightly mroe complicated than you see in most build environments, but it's not rocket science by any means.

    I was able last night to re-enable the pop-ups, and caught some data this morning. It's Greek to me, but I'll include it here for the sake of completeness. First, the last few frames of the stack trace (I have it all for anyone interested):

    >msvcrt.dll!77c46fa3() perl58.dll!Perl_newFOROP(interpreter * my_perl=0x00225ffc, long flags= +0, char * label=0x01867624, unsigned long forline=32, op * sv=0x00000 +000, op * expr=0x018956f4, op * block=0x018955dc, op * cont=0x0000000 +0) Line 3877 + 0x9 C perl58.dll!Perl_yyparse(interpreter * my_perl=0x0173adfc) Line 257 + +0x18 C perl58.dll!S_doeval(interpreter * my_perl=0x01890668, int gimme=0, op +* * startop=0x00000000, cv * outside=0x00000000, unsigned long seq=26 +30) Line 2817 + 0x6 C perl58.dll!Perl_pp_require(interpreter * my_perl=0x0167a5ac) Line 331 +4 + 0x3a C perl58.dll!Perl_runops_standard(interpreter * my_perl=0x00225ffc) Lin +e 23 + 0xc C
    And the disassembly of the memory around where it faulted:
    77C46F9B and edx,3 77C46F9E cmp ecx,8 77C46FA1 jb 77C46FCC 77C46FA3 rep movs dword ptr [edi],dword ptr [esi] 77C46FA5 jmp dword ptr [edx*4+77C470B8h] 77C46FAC mov eax,edi 77C46FAE mov edx,3 77C46FB3 sub ecx,4
    I did go look up op.c, which is where Perl_newFOROP() is defined, and found line 3877, which is:
    Copy(loop,tmp,1,LOOP);

    As I say, it's Greek to me, but maybe someone sees something significant here?

    Thanks to all, I know we're on the fringe of "is this a perl issue or a Windows problem", which means we're moving from something I know something about into the huge unknown for me (completely linux-centric)

      perl58.dll!Perl_newFOROP( ??? Doing a for loop ??? interpreter * my_perl=0x00225ffc, long flags=0, char *label=0x01867624, unsigned long forline=32, ??? Does this mean line 32 of the sourc +e file ??? op * sv=0x00000000, op * expr=0x018956f4, op * block=0x018955dc, op * cont=0x00000000 ) Line 3877 + 0x9 C perl58.dll!Perl_yyparse( ??? parsing ??? interpreter * my_perl=0x0173adfc ) Line 257 + 0x18 C perl58.dll!S_doeval( ??? Evaling the code ??? interpreter * my_perl=0x01890668, int gimme=0, op * * startop=0x00000000, cv * outside=0x00000000, unsigned long seq=2630 ) Line 2817 + 0x6 C perl58.dll!Perl_pp_require( ??? Loading a module ??? interpreter * my_perl=0x0167a5ac ) Line 3314 + 0x3a C perl58.dll!Perl_runops_standard( interpreter * my_perl=0x00225ffc ) Line 23 + 0xc C

      The stack trace suggests that the problem occurs when you are processing a for loop. And the assember instruction where the trap is occuring is the rep move ... which supports that.

      From that, I'd hazard a wild guess that you have a for loop that running off the end of the data it is copying, and from the stack trace, it looks like that for loop is located in a module--maybe at line 32 of the file?

      Like I say, that's mostly guesswork, but if your script doesn't have too many dependancies it might be worth looking at line 32 of each of them and seeing if there is a for loop there, before dismissing this completely :)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Wow...it's amazing what we learn on Perlmonks! :)

        Thanks, I took your advice, and did some digging. Will you be too surprised to learn I found a for() loop on line 32 of a module?

        It's not our module though...it's Exception::Class. Here's the beginning of the code in that module (ending at the for() statement on line 32):

        package Exception::Class; use 5.005; use strict; use vars qw($VERSION $BASE_EXC_CLASS %CLASSES); BEGIN { $BASE_EXC_CLASS ||= 'Exception::Class::Base'; } $VERSION = '1.19'; sub import { my $class = shift; local $Exception::Class::Caller = caller(); my %c; my %needs_parent; while (my $subclass = shift) { my $def = ref $_[0] ? shift : {}; $def->{isa} = $def->{isa} ? ( ref $def->{isa} ? $def->{isa} : [$def +->{isa}] ) : []; $c{$subclass} = $def; } # We need to sort by length because if we check for keys in the # Foo::Bar:: stash, this creates a "Bar::" key in the Foo:: stash! MAKE_CLASSES: foreach my $subclass ( sort { length $a <=> length $b } keys %c ) {
        The only place we do a 'use Exception::Class', it is accompanied by a long list of our own classes of Exception we define.

        again, I'm not sure what this means, but it feels like a red herring, since this code works 99.99% of the time and is exercised tens of thousands of times per day.

        I see op.c got a mountain of work between 5.8.5 and 5.8.8, so now I'm hoping to get approval for testring 5.8.8 to see if the issue clears up. It's still odd that this suddenly became an issue, but perhaps some change in Windows (security patch?) aggravated a known issue in op.c?

        Thanks again!

      This appears to be related to this ActiveState bug report against ActivePerl 809 (built on 5.8.3) and also this Perlbug ticket for 5.8.0 on Linux. It's listed as unconfirmed and medium priority by ActiveState. It's stalled waiting for example code that triggers it on perlbug.

      There seems to be a pretty good explanation of what appears to be going on (although I can't vouch for its accuracy). It seems from the two separate bug reports against two different dot releases on two different platforms that the bug came from upstream of ActiveState and affects at least four dot releases of 5.8 so far. I don't see anything in change logs saying it's been fixed in newer versions, but I'll admit I may have missed it. Also, it may be lumped into one of the "many things fixed" lines somewhere in a perldelta.

      I can't find an ActivePerl built from 5.8.5 so I guess it's pretty certain you're using something else. Another Perl on Windows project or built from source? On cygwin or not?

      Since the bug reports are waiting for an example or more of the problem, I'd suggest replying to stmpeters on the Perl 5 RT system at the above-mentioned bug #34450 about your code. If it turns out to be a known issue fixed in a newer version, at least the bug tracker will be updated to show that and you'll be told what version was first fixed.

      Of course, this being PerlMonks, someone with the skills, time, and access to look into this might be reading the thread right now. I wouldn't hold my breath waiting for that, though.

      Might be good to test a newer version first. If a test box with 5.8.7 or 5.8.8 doesn't seem to fix it, you might consider a bug report after that.


      Christopher E. Stith
        Thank you! +++++++ I'll see if I can somehow boil this down to a test case...if not, at least it's nice to understand what's happening a little better!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://576483]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-26 00:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found