in reply to Re: Problem opening file
in thread Problem opening file

Thanks for responding, BrowserUk, this is really helpful.

$lockfile is definitely not a shared variable

Yes, I did redact the server and share names. The boss requires that we continue to redact this type of information for security reasons. I can tell you that the server name is the same in every instance, but the share name has been different each time. Here are all of the examples I have in my log files:

20150509-stderr.log:Thread 6 terminated abnormally: Failed to open \\< +i>PRODSERVERA</i>\<i>BEYC</i>\INQUEUE\loadqueue.lck: Invalid argument + at service.plx line 596. 20150523-stderr.log:Thread 7 terminated abnormally: Failed to open \\< +i>PRODSERVERA</i>\<i>FRT4</i>\INQUEUE\loadqueue.lck: Invalid argument + at service.plx line 596. 20150704-stderr.log:Thread 10 terminated abnormally: Failed to open \\ +<i>PRODSERVERA</i>\<i>JPGE</i>\INQUEUE\loadqueue.lck: Invalid argumen +t at service.plx line 596.

The logging is being handled by a Thread::Queue (actually by two - one for STDOUT and one for STDERR), so all of the log entries are in chronological order, although I don't capture specific file system activity at the moment.

In each case, the pattern around what happened is different. In the case of BEYC, the faulting thread loaded files for BEYC, then did a whole bunch of other recipients for the next 12 hours, then crashed on the very next BEYC file to come in. For FRT4, it loaded a bunch of files, with the last file being loaded using a library call and crashing out on the very next file that needed to load. For JPGE, it again loaded a bunch of files successfully, but the last file loaded was passed out to a new Perl instance in a system call before crashing on the next file.

Replies are listed 'Best First'.
Re^3: Problem opening file
by BrowserUk (Patriarch) on Jul 09, 2015 at 17:09 UTC

    Okay. Try adding $^E to your logging; it might make for a clearer picture.

    It would be easier to suggest something if you posted the thread code, but at a very minimum I'd rewrite that open something like this:

    my $retries = 5; my $fhLock; while( $retries and not open $fhLock, ">", $lockfile ) { warn "Failed to open $lockfile: $! [$^E]"; sleep 1; --$retries; }

    And then wait until it happens again. That should tell you whether its a temporary, transitory problem or not; and perhaps shed more light on the cause.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

      Thanks for your suggestions, I've rewritten the open call as per your and marinersk's suggestions and will also add $^E to the output.

      I'll have a chat with the boss today as to how much of the code I can share with you and try to get another update in after lunch